Polypeptides having a functional domain of interest and methods of identifying and using same

ABSTRACT

Novel polypeptides having functional domains of interest are described, along with DNA sequences that encode the same. A method of identifying these polypeptides by means of a sequence-independent (that is, independent of the primary sequence of the polypeptide sought), recognition unit-based functional screen is also disclosed. Various applications of the method and of the polypeptides identified are described, including their use in assay kits for drug discovery, modification, and refinement.

This application is a continuation-in-part of co-pending U.S. patentapplication Ser. No. 08/417,872 filed Apr. 7, 1995, the entire contentsof which are incorporated herein by reference.

1. INTRODUCTION

The present invention is directed to polypeptides having a functionaldomain of interest or functional equivalents thereof. Methods ofidentifying these polypeptides are described, along with various methodsof their use, including but not limited to targeted drug discovery.

2. BACKGROUND OF THE INVENTION

Combinatorial libraries represent exciting new tools in basic scienceresearch and drug design. It is possible through synthetic chemistry ormolecular biology to generate libraries of complex polymers, with manysubunit permutations. There are many guises to these libraries: randompeptides, which can be synthesized on plastic pins (Geysen et al., 1987,J. Immunol. Meth. 102:259-274), beads (Lam et al., 1991, Nature354:82-84) or in a soluble form (Houghten et al., 1991, Nature354:84-86) or expressed on the surface of viral particles (Cwirla etal., 1990, Proc. Natl. Acad. Sci. USA 87:6378-6382; Kay et al., 1993,Gene 128:59-65; Scott and Smith, 1990, Science 249:386-390); nucleicacids (Ellington and Szostak, 1990, Nature 346:818-822; Gao et al.,1994, Proc. Natl. Acad. Sci. USA 91:11207-11211; Tuerk and Gold, 1990,Science 249:505-510); and small organic molecules (Gordon et al., 1994,J. Med. Chem. 37:1385-1401). These libraries are very useful in mappingprotein-protein interactions and discovering drugs.

Phage display has become a powerful method for screening populations ofpeptides, mutagenized proteins, and cDNAs for members that have affinityto target molecules of interest. It is possible to generate 10⁸-10⁹different recombinants from which one or more clones can be selectedwith affinity to antigens, antibodies, cell surface receptors, proteinchaperones, DNA, metal ions, etc. Screening libraries is versatilebecause the displayed elements are expressed on the surface of the virusas capsid-fusion proteins. The most important consequence of thisarrangement is that there is a physical linkage between phenotype andgenotype. There are several other advantages as well: 1) virus particleswhich have been isolated from libraries by affinity selection can beregenerated by simple bacterial infection, and 2) the primary structureof the displayed binding peptide or protein can be easily deduced by DNAsequencing of the cloned segment in the viral genome.

Combinatorial peptide libraries have been expressed in bacteriophage.Synthetic oligonucleotides, fixed in length, but with multipleunspecified codons can be cloned into genes III, VI, or VIII ofbacteriophage M13 where they are expressed as a plurality ofpeptide:capsid fusion proteins. The libraries, often referred to asrandom peptide libraries, can be screened for binding to targetmolecules of interest. Usually, three to four rounds of screening can beaccomplished in a week's time, leading to the isolation of one tohundreds of binding phage.

The primary structure of the binding peptides is then deduced bynucleotide sequencing of individual clones. Inspection of the peptidesequences sometimes reveals a common motif, or consensus sequence.Generally, this motif when synthesized as a soluble peptide has the fullbinding activity. Random peptide libraries have successfully yieldedpeptides that bind to the Fab site of antibodies (Cwirla et al., 1990,Proc. Natl. Acad. Sci. USA 87:6378-6382; Scott and Smith, 1990, Science249:386-390), cell surface receptors (Doorbar and Winter, 1994, J. Mol.Biol. 244:361-369; Goodson et al., 1994, Proc. Natl. Acad. Sci. USA91:7129-7133), cytosolic receptors (Blond-Elguindi et al., 1993, Cell75:717-728), intracellular proteins (Daniels and Lane, 1994, J. Mol.Biol. 243:639-652; Dedman et al., 1993, J. Biol. Chem. 268:23025-23030;Sparks et al., 1994, J. Biol. Chem. 269:23853-23856), DNA (Krook et al.,1994, Biochem. Biophys. Res. Comm. 204:849-854), and many other targets(Winter, 1994, Drug Dev. Res. 33:71-89).

Most vital cellular processes are regulated by the transmission ofsignals throughout the cell in the form of complex interactions betweenproteins. As the study of signal transduction, or the flow ofinformation throughout the cell, has broadened and matured, it hasbecome apparent that these protein-protein interactions are oftenmediated by modular domains within signalling proteins. Src, both thefirst proto-oncogene product and the first tyrosine kinase discovered(Taylor and Shalloway, 1993, Current Opinion in Genetics and Development3:26-34), is the prototypic modular domain-containing protein.

Src is a protein tyrosine kinase of 60 kilodaltons and is located at theplasma membrane of cells. It was first discovered in the 1970's to bethe oncogenic element of Rous sarcoma virus, and in the 1980's, it wasappreciated to be a component of the signal transduction system inanimal cells. However, since the identification of viral and cellularforms of Src (i.e., v-Src and c-Src), their respective roles inoncogenesis, normal cell growth, and differentiation have not beencompletely understood.

In addition to its tyrosine kinase region (sometimes called a SrcHomology 1 domain), Src contains two regions that have been found tohave functionally and structurally homologous counterparts in a largenumber of proteins. These regions have been designated the Src Homology2 (SH2) and Src Homology 3 (SH3) domains. SH2 and SH3 domains aremodular in that they fold independently of the protein that containsthem, their secondary structure places N- and C-termini close to oneanother in space, and they appear at variable locations (anywhere fromN- to C-terminal) from one protein to the next (Cohen et al., 1995, Cell80:237-248). SH2 domains have been well-studied and are known to beinvolved in binding to phosphorylated tyrosine residues (Pawson andGish, 1992, Cell 71:359-362).

The Src-homology region 3 (SH3) of Src is a domain that is 60-70 aminoacids in length and is present in many cellular proteins (Cohen et al.,1995, Cell 80:237-248; Pawson, 1995, Nature 373:573-580). Within Src,the SH3 domain is considered to be a negative inhibitory domain, becausec-Src can be activated (i.e., transforming) through mutations in thisdomain (Jackson et al., 1993, Oncogene 8:1943-1956; Seidel-Dugan et al.,1992, Mol Cell Biol 12:1835-1845).

To deduce the binding specificity of the Abl, SH3 domain, a group led byDavid Baltimore screened cDNA libraries with radiolabeled GST-Abl SH3fusion protein and identified two binding cDNA clones (Cicchetti et al.,1992, Science 257:803-806). Both clones encoded proteins with prolinerich regions that were later shown to be SH3 binding domains.

Subsequently, others have screened combinatorial peptide libraries andidentified peptides that bound to the Src SH3 domain (Yu et al., 1994,Cell 76:933-945; Cheadle et al., 1994, J. Biol. Chem. 269:24034-24039).Using the SH3 domain of Src, Sparks et al., 1994, J. Biol. Chem.269:23853-23856 screened phage-display random peptide libraries andidentified a consensus peptide sequence that binds with specificity andhigh affinity to the Src SH3 domain.

The consensus from these various studies is that the optimal Src SH3peptide ligand is RPLPPLP (SEQ ID NO:45). Recently, the structures ofthe peptide-SH3 domain complexes have been deduced by NMR and thepeptides have been shown to bind in two possible orientations withrespect to the SH3 domain (Feng et al., 1994, Science 266:1241-1247; Limet al., 1994, Nature 372:375-379).

Since SH3 domains have been found to have such important roles in thefunction of crucial signalling and structural elements in the cell, amethod of identifying proteins containing SH3 regions is of greatinterest. In this regard, it is important to note that such a method isunavailable because of the low sequence similarity of modular functionaldomains, including SH3. See, e.g., FIG. 6, which illustrates the minimalprimary sequence homology among various known SH3 domains.

Sequence homology searches can potentially identify known proteinscontaining not yet recognized functional domains of interest, however,sequence homology generally needs to be >40% for this procedure to besuccessful. Functional domains generally are less than 40% homologousand therefore many would be missed in a sequence homology search. Inaddition, homology searches do not identify novel proteins; they onlyidentify proteins already defined by nucleotide or amino acid sequenceand present in the database.

Another approach is to use hybridization techniques using nucleotideprobes to search expression libraries for novel proteins. This methodwould have limited applicability to finding novel proteins containingfunctional domains due to the low sequence homology of the functionaldomains.

Methods for isolating partner proteins involved in protein-proteininteractions have generally focused on finding a ligand to a proteinthat has been found and characterized. Such approaches have includedusing anti-idiotypic antibodies that mimic the known protein to screencDNA expression libraries for a binding ligand (Jerne, 1974, Ann.Immunol. (Inst. Pasteur) 125c:373-389; Sudol, 1994, Oncogene9:2145-2152). Skolnick et al., 1991, Cell 65:83-90 isolated a bindingpartner for PI3-kinase by screening a cDNA expression library with the³²P-labeled tyrosine phosphorylated carboxyl terminus of the epidermalgrowth factor receptor (EGFR).

An easy method for isolating operationally defined ligands involved inprotein-protein interactions and for optimally identifying an exhaustiveset of modular domain-containing proteins implicated in binding with theligands would be highly desirable.

If such a method were available, however, such a method would be usefulfor the isolation of any polypeptide having a functioning version of anyfunctional domain of interest. Such a general method would be oftremendous utility, in that whole families of related proteins each withits own version of the functional domain of interest could beidentified. Knowledge of such related proteins would contribute greatlyto our understanding of various physiological processes, including cellgrowth or death, malignancy, and immune reactions, to name a few. Such amethod would also contribute to the development of increasingly moreeffective therapeutic, diagnostic, or prophylactic agents having fewerside effects.

According to the present invention, just such a method is provided.

Regarding SH3 domain-containing proteins, the method of the presentinvention will contribute greatly to our understanding of cell growth(Zhu et al., 1993, J. Biol. Chem. 268:1775-1779; Taylor and Shalloway,1994, Nature 368:867-871), malignancy (Wages et al., 1992, J. Virol.66:1866-1874; Bruton and Workman, 1993, Cancer Chemother. Pharmacol.32:1-19), subcellular localization of proteins to the cytoskeletonand/or cellular membranes (Weng et al., 1993, J. Biol. Chem.268:14956-14963; Bar-Sagi et al., 1993, Cell 74:83-91), signaltransduction (Duchesne et al., 1993, Science 259:525-528), cellmorphology (Wages et al., 1992, J. Virol. 66:1866-1874; McGlade et al.,1993, EMBO J. 12:3073-3081), neuronal differentiation Tanaka et al.,1993, Mol. Cell. Biol. 13:4409-4415), T cell activation (Reynolds etal., 1992, Oncogene 7:1.949-1955), and cellular oxidase activity(McAdara and Babior, 1993, Blood 82:A28).

Citation of a reference hereinabove shall not be construed as anadmission that such is prior art to the present invention.

3. SUMMARY OF THE INVENTION

In general, the present invention is directed to a method of usingisolated, operationally defined ligands involved in binding interactionsfor optimally identifying an exhaustive set of compounds binding to suchligands. In one embodiment, the isolated ligands are peptides involvedin specific protein-protein interactions and are used to identify a setof novel modular domain-containing proteins that bind to the ligands.Using this method, proteins sharing only modest similarities but acommon function can be found.

The present invention is directed to a method of identifying apolypeptide or family of polypeptides having a functional domain ofinterest. The basic steps of the method comprise: (a) choosing arecognition unit or set of recognition units having a selective affinityfor a target molecule with a functional domain of interest; (b)contacting the recognition unit with a plurality of polypeptides; and(c) identifying a polypeptide having a selective binding affinity forthe recognition unit, which polypeptide includes the functional domainof interest or a functional equivalent thereof.

In one particular embodiment of the invention, exhaustive screening ofproteins having a desired functional domain involves an iterativeprocess by which ligands or recognition units for SH3 domains identifiedin the first round of screening are used to detect SH3 domain-containingproteins in successive expression library screens.

More particularly, the method of the present invention includes choosinga recognition unit having a selective affinity for a target moleculewith a functional domain of interest. With this recognition unit(particularly under the multvalent recognition unit screening conditionstaught by the present invention), it has further been discovered that aplurality of polypeptides from various sources can be examined such thatcertain polypeptides having a selective binding affinity for therecognition unit can be identified. The polypeptides so identified havebeen shown to include the functional domain of interest; that is, thefunctional domains found are working versions that are capable ofdisplaying the same binding specificity as the functional domain ofinterest. Hence, the polypeptides identified by the present method alsopossess those attributes of the functional domain of interest whichallow these related polypeptides to exhibit the same, similar, oranalogous (but functionally equivalent) selective affinitycharacteristics as the domain of interest of the initial targetmolecule. By screening the plurality of peptides for recognition unitbinding, the methods of the present invention circumvent the limitationsof conventional DNA-based screening methods and allow for theidentification of highly disparate protein sequences possessingfunctionally equivalent functional domains.

In specific embodiments of the present invention, the plurality ofpolypeptides is obtained from the proteins present in a cDNA expressionlibrary. The specificity of the polypeptides which bear the functionaldomain of interest or a functional equivalent thereof for variouspeptides or recognition units can subsequently be examined, allowing fora greater understanding of the physiological role of particularpolypeptide/recognition unit interactions. Indeed, the present inventionprovides a method of targeted drug discovery based on the observedeffects of a given drug candidate on the interaction between arecognition unit-polypeptide pair or a recognition unit and a “panel” ofrelated polypeptides each with a copy or a functional equivalent of(e.g., capable of displaying the same binding specificity and thusbinding to the same recognition unit as) the functional domain ofinterest.

The present invention also provides polypeptides comprising certainamino acid sequences. Moreover, the present invention also providesnucleic acids, including certain DNA constructs comprising certaincoding sequences. Using the methods of the present invention, more thaneighteen different SH3 domain-containing proteins have been identified,over half of which have not been previously described.

The present inventors have found, unexpectedly, that the valency (i.e.,whether it is a monomer, dimer, tetramer, etc.) of the recognition unitthat is used to screen an expression library or other source ofpolypeptides apparently has a marked effect upon the specificity of therecognition unit-functional domain interaction. The present inventorshave discovered that recognition units in the form of small peptides, inmultivalent form, have a specificity that is eased but not forfeited. Inparticular, biotinylated peptides bound to a multivalent (believed to betetravalent) streptavidin-alkaline phosphatase complex have anunexpected generic specificity. This allows such peptides to be used toscreen libraries to identify classes of polypeptides containingfunctional domains that are similar but not identical in sequence to thepeptides' original target functional domains.

The present invention also provides methods for identifying potentialnew drug candidates (and potential lead compounds) and determining thespecificities thereof. For example, knowing that a polypeptide with afunctional domain of interest and a recognition unit, e.g., a bindingpeptide, exhibit a selective affinity for each other, one may attempt toidentify a drug that can exert an effect on the polypeptide-recognitionunit interaction, e.g., either as an agonist or as an antagonist(inhibitor) of the interaction. With this assay, then, one can screen acollection of candidate “drugs” for the one exhibiting the most desiredcharacteristic, e.g., the most efficacious in disrupting the interactionor in competing with the recognition unit for binding to thepolypeptide.

In addition, the present invention also provides certain assay kits andmethods of using these assay kits for screening drug candidates fortheir ability to affect the binding of a polypeptide containing afunctional domain to a recognition unit. In a particular aspect of thepresent invention, the assay kit comprises: (a) a polypeptide containinga functional domain of interest; and (b) a recognition unit having aselective binding affinity for the polypeptide. Yet another assay kitmay comprise a plurality of polypeptides, each polypeptide containing afunctional domain of interest, in which the functional domain ofinterest is a domain selected from the group consisting of an SH1, SH2,SH3, PH, PTB, LIM, armadillo, Notch/ankyrin repeat, zinc finger, leucinezipper, and helix-turn-helix, and at least one recognition unit having aselective affinity for each of the plurality of polypeptides.

Other objects of the present invention will be apparent to those ofordinary skill upon further consideration of the following detaileddescription.

4. DESCRIPTION OF THE FIGURES

FIG. 1 is a schematic representation of the general aspects of a methodof identifying recognition units exhibiting a selective affinity for atarget molecule with a functional domain of interest. In thisillustration, the target molecule is a polypeptide with an SH3 domain,and the recognition units are peptides having a selective affinity forthe SH3 domain that are expressed in a phage displayed library.

FIG. 2 illustrates the selectivities exhibited by particular recognitionunits that bind to the Src SH3 domain (in this case, two heptapeptides)for a “panel” of known polypeptides known to contain an SH3 domain. Thenon-SH3-containing protein, GST, serves as control. RPLPPLP is (SEQ IDNO:45); APPVPPR is (SEQ ID NO:203)

FIG. 3 is a schematic representation of the general method ofidentifying polypeptides with a functional domain of interest byscreening a plurality of polypeptides using a suitable recognition unit.In the illustration, the plurality of polypeptides is obtained from acDNA expression library, and the recognition units are SH3domain-binding peptides.

FIG. 4 illustrates how an SH3 domain-binding peptide can be used toidentify other SH3 domain-containing proteins. Shown is a schematicrepresentation of the progression from initial selection of a targetmolecule with a functional domain of interest, choice of recognitionunit, and identification of polypeptides that have a selective affinityfor the recognition unit and include the functional domain of interestor a functional equivalent thereof.

FIG. 5 depicts filters from primary (FIG. 5B) and tertiary (FIG. 5A)screens of a λcDNA library probed with a biotinylated SH3-bindingpeptide recognition unit in the form of a complex withstreptavidin-alkaline phosphatase (SA-AP). A mouse 16 day embryo cDNAlibrary in λEXlox was incubated with a multivalent complex formedbetween biotinylated pSrcCII and SA-AP. The sites of peptide bindingwere detected by incubation with BCIP(5-bromo-4-chloro-3-indoyl-phosphate-p-toluidine salt) and NBT(nitroblue tetrazolium chloride) for approximately five minutes.

FIG. 6 shows an alignment of SH3 domains that illustrates the minimalprimary sequence homology among various known SH3 domains. The aminoacid sequences shown are SEQ ID NOs:68-111.

FIG. 7A is a schematic representation of a population of functionaldomains represented by the circles. “A” is a recognition unit specificto one circle only. B, on the other hand, recognizes three domains,while B1 and B2 recognize only two each. FIG. 7B illustrates aniterative method whereby new recognition units are chosen based onpolypeptides uncovered with the first recognition unit(s). These newrecognition units lead to the identification of other relatedpolypeptides, etc., expanding the scope of the study to increasinglydiverse members of the related population.

FIG. 8 illustrates the binding specificity of several SH3 domainrecognition units. Biotinylated Class I (pSrcCI) or Class II (pSrcCII)Src SH3 domain recognition units, Crk SH3 domain recognition units(pCrk), PLCγ SH3 domain recognition units (pPLC) and Abl SH3 domainrecognition units (pAbl) were tested for binding to the indicatedGST-SH3 domain fusion proteins immobilized onto duplicate microtiterplate wells. Recognition units are listed along the left side of thefigure; GST-SH3 domain fusion proteins are listed along the bottom.Recognition units were incubated either as multivalent complexes ofbiotinylated peptides and streptavidin-horseradish peroxidase (SA-HRP)(complexed) or as monovalent biotinylated peptides (uncomplexed),followed by incubation with SA-HRP. Average optical densities are shown.

FIG. 9 shows a schematic of SH3-domain containing proteins isolatedusing the present invention. The name, identity, type of screen, andnumber of individual clones derived for each sequence are indicated.Diagrams are to scale, with SH3 domains representing approximately 60amino acids. The abbreviations AR, P, CR, E/P, and SH2 represent ankyrinrepeats, proline-rich segments, Cortactin repeats,glutamate/proline-rich segments, and Src homology 2 domains,respectively. Flared ends represent putative translation initiationsites for individual cDNAs. The Mouse, Human 1, and Human 2 librariescorrespond to mouse 16 day embryo, human bone marrow, and human prostatecancer cDNA libraries, respectively. For a description of the pSrcII andpCort recognition units, see Section 6.1.

FIGS. 10A and 10B depicts the sequence alignment of SH3 domains inproteins isolated using the present invention. The name and identity ofeach clone is indicated. Where appropriate, multiple SH3 domains fromthe same polypeptide are designated A, B, C, etc., from N- toC-terminal. Periods indicate gaps introduced to maximize alignment ofsimilar residues. Positions corresponding to conserved residues shown tobe involved in ligand binding in the SH3 domains of Src and Grb2/Sem5(Tomasetto et al., 1995, Genomics 28:367-376) are presented in bold andunderlined, respectively. Primary structures of SH3P1-8 and SH3P10-13correspond to mouse, SH3P15-18, clone 5, 34, 40, 41, 45, 53, 55, 56, and65 to human, and SH3P9 and SH3P14 to mouse (m) or human (h) cDNA clones.For sequence comparison, the sequence of the mouse c-Src SH3 domain(GenBank accession number P41240) is shown. The GenBank accessionnumbers for mouse Cortactin, SPY75/HS1, Crk, and human MLN50, Lyn, Fyn,and Src are U03184, D42120, S72408, X82456, M16038, P06241, and P41240,respectively. The amino acid sequences shown are SEQ ID NOs:112-140.

FIG. 11 depicts the specificity continuum described in Section 5.2.1.“SA-AP peptide complex” represents the multivalent (believed to betetravalent) complex of streptavidin-alkaline phosphatase andbiotinylated peptide described in that section.

FIG. 12 depicts the results of experiments in which peptide recognitionunits were synthesized and tested for their ability to bind to novel SH3domains described in Sections 6.1 and 6.1.1. A minus indicates nobinding; a plus indicates binding, with the number of pluses indicatingthe strength of binding. For further details, see Section 6.2. The aminoacid sequences shown are SEQ ID NOs:141-168.

FIG. 13 depicts more data from the experiment depicted in FIG. 12. Theamino acid sequences shown are SEQ ID NOs:169-188.

FIG. 14 illustrates the effect of preconjugation withstreptavidin-alkaline phosphatase on the affinity of biotinylatedpeptides for SH3 domains. See Section 6.3.1 for details.

FIG. 15 illustrates the effect of preconjugation withstreptavidin-alkaline phosphatase on the specificity of biotinylatedpeptides for GST-SH3 domain fusion proteins that have been immobilizedon nylon membranes. See Section 6.3.2 for details.

FIG. 16 illustrates the effect of preconjugation withstreptavidin-alkaline phosphatase on the specificity of biotinylatedpeptides for proteins containing SH3 domains expressed by cDNA clones.See Section 6.3.3 for details.

FIG. 17 illustrates a strategy for exhaustively screening an expressionlibrary for SH3 domain-containing proteins. A peptide recognition unitis generated by screening a combinatorial peptide library for binders toan SH3 domain expressed bacterially as a GST fusion protein. Thispeptide is then used as a multivalent streptavidin-biotinylated peptidecomplex to screen for a subset of the SH3 domain-containing proteinsrepresented in a cDNA expression library. A combinatorial library isonce again used to identify recognition units of SH3 domains identifiedin the first expression library screen; these recognition units identifyoverlapping sets of proteins from the expression library. With multipleiterations of this process, it should be possible to clonesystematically all SH3 domains represented in a given cDNA expressionlibrary.

FIG. 18 depicts the nucleotide sequence of SH3 P1, mouse p53 bp2 (SEQ IDNO:5).

FIG. 19 depicts the amino acid sequence of SH3P1, mouse p53 bp2 (SEQ IDNo:6).

FIG. 20 depicts the nucleotide sequence of SH3P2, a novel mouse gene(SEQ ID NO:7).

FIG. 21 depicts the amino acid sequence of SH3P2, a novel mouse gene(SEQ ID NO:8).

FIG. 22 depicts the nucleotide sequence of SH3P3, a novel mouse gene(SEQ ID NO:9).

FIG. 23 depicts the amino acid sequence of SH3P3, a novel mouse gene(SEQ ID NO:10).

FIG. 24 depicts the nucleotide sequence of SH3P4, a novel mouse gene(SEQ ID NO:11).

FIG. 25 depicts the amino acid sequence of SH3P4, a novel mouse gene(SEQ ID NO:12).

FIG. 26 depicts the nucleotide sequence of SH3P5, mouse Cortactin (SEQID NO:13).

FIG. 27 depicts the amino acid sequence of SH3P5, mouse Cortactin (SEQID NO:14).

FIG. 28 depicts the nucleotide sequence of SH3P6, mouse MLN50 (SEQ IDNO:15).

FIG. 29 depicts the amino acid sequence of SH3P6, mouse MLN50 (SEQ IDNO:16).

FIG. 30 depicts the nucleotide sequence of SH3P7, a novel mouse gene(SEQ ID NO:17).

FIG. 31 depicts the amino acid sequence of SH3P7, a novel mouse gene(SEQ ID NO:18).

FIG. 32 depicts the nucleotide sequence of SH3P8, a novel mouse gene(SEQ ID NO:19).

FIG. 33 depicts the amino acid sequence of SH3P8, a novel mouse gene(SEQ ID NO:20).

FIG. 34 depicts the nucleotide sequence of SH3P9, a novel mouse gene(SEQ ID NO:21).

FIG. 35 depicts the amino acid sequence of SH3P9, a novel mouse gene(SEQ ID NO:22).

FIG. 36 depicts the nucleotide sequence of SH3P9, a novel human gene(SEQ ID NO:23).

FIG. 37 depicts the amino acid sequence of SH3P9, a novel human gene(SEQ ID NO:24).

FIG. 38 depicts the nucleotide sequence of SH3P10, mouse HS1 (SEQ IDNO:25).

FIG. 39 depicts the amino acid sequence of SH3P10, mouse HS1 (SEQ IDNO:26).

FIG. 40 depicts the nucleotide sequence of SH3 P11, mouse Crk (SEQ IDNO:27).

FIG. 41 depicts the amino acid sequence of SH3P11, mouse Crk (SEQ IDNO:28).

FIG. 42A depicts the nucleotide sequence from positions 1-2600 ofSH3P12, a novel mouse gene (a portion of SEQ ID NO:29).

FIG. 42B depicts the nucleotide sequence from positions 2601-3335 ofSH3P12, a novel mouse gene (a portion of SEQ ID NO:29).

FIG. 43 depicts the amino acid sequence of SH3P12, a novel mouse gene(SEQ ID NO:30).

FIG. 44 depicts the nucleotide sequence of SH3 P13, a novel mouse gene(SEQ ID NO:31).

FIG. 45 depicts the amino acid sequence of SH3P13, a novel mouse gene(SEQ ID NO:32).

FIG. 46A depicts the nucleotide sequence from positions 1-2400 ofSH3P14, mouse H74 (a portion of SEQ ID NO:33).

FIG. 46B depicts the nucleotide sequence from positions 2351-4091 of SH3P14, mouse H74 (a portion of SEQ ID NO:33).

FIG. 47 depicts the amino acid sequence of SH3P14, mouse H74 (SEQ IDNO:34).

FIG. 48 depicts the nucleotide sequence of SH3P14, human H74 (SEQ IDNO:35).

FIG. 49 depicts the amino acid sequence of SH3P14, human H74 (SEQ IDNO:36).

FIG. 50 depicts the nucleotide sequence of SH3P17, a novel human gene(SEQ ID NO:37).

FIG. 51 depicts the amino acid sequence of SH3P17, a novel human gene(SEQ ID NO:38).

FIG. 52A depicts the nucleotide sequence of SH3P18, a novel human gene(SEQ ID NO:39).

FIG. 53 depicts the amino acid sequence of SH3 P18, a novel human gene(SEQ ID NO:40).

FIG. 54 depicts the nucleotide sequence of clone 55, a novel human gene(SEQ ID NO:189).

FIG. 55 depicts the amino acid sequence of clone 55, a novel human gene(SEQ ID NO:190).

FIG. 56 depicts the nucleotide sequence of clone 56, a novel human gene(SEQ ID NO:191).

FIG. 57 depicts the amino acid sequence of clone 56, a novel human gene(SEQ ID NO:192).

FIG. 58A depicts the nucleotide sequence from position 1-1720 of clone65, a novel human gene (a portion of SEQ ID NO:193).

FIG. 58B depicts the nucleotide sequence from position 1721-2873 ofclone 65, a novel human gene (a portion of SEQ ID NO:193).

FIG. 59 depicts the amino acid sequence of clone 65, a novel human gene(SEQ ID NO:194).

FIG. 60 depicts the nucleotide sequence of clone 34, a novel human gene(SEQ ID NO:195).

FIG. 61A depicts a portion of the amino acid sequence of clone 34, anovel human gene (a portion of SEQ ID NO:196).

FIG. 61B depicts a portion of the amino acid sequence of clone 34, anovel human gene (a portion of SEQ ID NO:196).

FIG. 62 depicts the nucleotide sequence of clone 41, a novel human gene(SEQ ID NO:197).

FIG. 63A depicts a portion of the amino acid sequence of clone 41, anovel human gene (a portion of SEQ ID NO:198).

FIG. 63B depicts a portion of the amino acid sequence of clone 41, anovel human gene (a portion of SEQ ID NO:198).

FIG. 64A depicts the nucleotide sequence of clone 53, a novel human gene(SEQ ID NO:199).

FIG. 65A depicts a portion of the amino acid sequence of clone 53, anovel human gene (a portion of SEQ ID NO:200).

FIG. 65B depicts a portion of the amino acid sequence of clone 53, anovel human gene (a portion of SEQ ID NO:200).

FIGS. 66A and 66B depicts the nucleotide sequence (SEQ ID NO:220) andamino acid sequence (SEQ ID NO:221) of clone 5, a novel human gene.

5. DETAILED DESCRIPTION OF THE INVENTION

As stated above, the present invention is related broadly to certainpolypeptides having a functional domain of interest and is directed tomethods of identifying and using these polypeptides. The presentinvention is also directed to a method of using isolated, operationallydefined ligands involved in binding interactions for optimallyidentifying an exhaustive set of compounds binding such ligands and tocompounds, target molecules, and, in one embodiment, polypeptides havinga functional domain of interest and to methods of using these compounds.The detailed description that follows is provided to elucidate theinvention further and to assist further those of ordinary skill who maybe interested in practicing particular aspects of the invention.

First, certain definitions are in order. Accordingly, the term“polypeptide” refers to a molecule comprised of amino acid residuesjoined by peptide (i.e., amide) bonds and includes proteins andpeptides. Hence, the polypeptides of the present invention may havesingle or multiple chains of covalently linked amino acids and mayfurther contain intrachain or interchain linkages comprised of disulfidebonds. Some polypeptides may also form a subunit of a multiunitmacromolecular complex. Naturally, the polypeptides can be expected topossess conformational preferences and to exhibit a three-dimensionalstructure. Both the conformational preferences and the three-dimensionalstructure will usually be defined by the polypeptide's primary (i.e.,amino acid) sequence and/or the presence (or absence) of disulfide bondsor other covalent or non-covalent intrachain or interchain interactions.

The polypeptides of the present invention can be any size. As can beexpected, the polypeptides can exhibit a wide variety of molecularweights, some exceeding 150 to 200 kilodaltons (kD). Typically, thepolypeptides may have a molecular weight ranging from about 5,000 toabout 100,000 daltons. Still others may fall in a narrower range, forexample, about 10,000 to about 75,000 daltons, or about 20,000 to about50,000 daltons.

The phrase “functional domain” refers to a region of a polypeptide whichaffords the capacity to perform a particular function of interest. Thisfunction may give rise to a biological, chemical, or physiologicalconsequence that may be reversible or irreversible and which mayinclude, but not be limited to, protein-protein interactions (e.g.,binding interactions) involving the functional domain, a change in theconformation or a transformation into a different chemical state of thefunctional domain or of molecules acted upon by the functional domain,the transduction of an intracellular or intercellular signal, theregulation of gene or protein expression, the regulation of cell growthor death, or the activation or inhibition of an immune response.Furthermore, the functional domain of interest is defined by aparticular functional domain that is present in a given target molecule.A discussion of the selection of a particular functionaldomain-containing target molecule is presented further below.

Many functional domains tend to be modular in that such domains mayoccur one or more times in a given polypeptide (or target molecule) ormay be found in a family of different polypeptides. When found more thanonce in a given polypeptide or in different polypeptides, the modularfunctional domain may possess substantially the same structure, in termsof primary sequence and/or three-dimensional space, or may containslight or great variations or modifications among the different versionsof the functional domain of interest.

What is important, however, is that these related functional domainsretain the functional aspects of the functional domain of interestpresent in the target molecule. It is stressed that, indeed, it is thisfunctional relationship among two or more possible versions of afunctional domain of interest which may be identified, defined, andexploited by the methods of the present invention. In a preferredaspect, the function of interest is the ability to bind to a molecule(e.g., a peptide) of interest.

The present invention provides a general strategy by which recognitionunits that bind to a functional domain-containing molecule can be usedto screen expression libraries of genes (e.g., cDNA, genomic libraries)systematically for novel functional domain-containing proteins. Inspecific embodiments, the recognition units are prior isolated from arandom peptide library, or are known peptide ligands or recognitionunits, or are recognition units that are identified by database searchesfor sequences having homology to a peptide recognition unit having thebinding specificity of interest. Using the methods of the presentinvention, it is possible to exhaustively screen an expression libraryfor proteins with a given functional domain.

In the prior art, novel genes (and thus their encoded protein products)are most commonly identified from cDNA libraries. Generally, anappropriate cDNA library is screened with a probe that is either anoligonucleotide or an antibody. In either case, the probe must bespecific enough for the gene that is to be identified to pick that geneout from a vast background of non-relevant genes in the library. It isthis need for a specific probe that is the highest hurdle that must beovercome in the prior art identification of novel genes. Another methodof identifying genes from cDNA libraries is through use of thepolymerase chain reaction (PCR) to amplify a segment of a desired genefrom the library. PCR requires that oligonucleotides having sequencesimilarity to the desired gene be available.

If the probe used in prior art methods is a nucleic acid, the cDNAlibrary may be screened without the need for expressing any proteinproducts that might be encoded by the cDNA clones. If the probe used inprior art methods is an antibody, then it is necessary to build the cDNAlibrary into a suitable expression vector. For a comprehensivediscussion of the art of identifying genes from cDNA libraries, seeSambrook, Fritsch, and Maniatis, “Construction and Analysis of cDNALibraries,” Chapter 8 in Cloning, A Laboratory Manual, 2d ed., ColdSpring Harbor Laboratory Press, 1989. See also Sambrook, Fritsch, andManiatis, “Screening Expression Libraries with Antibodies andOligonucleotides,” Chapter 12 in Cloning, A Laboratory Manual, 2d ed.,Cold Spring Harbor Laboratory Press, 1989.

As an alternative to cDNA libraries, genomic libraries are used. Whengenomic libraries are used in prior art methods, the probe is virtuallyalways a nucleic acid probe. See Sambrook, Fritsch, and Maniatis,“Analysis and Cloning of Eukaryotic Genomic DNA,” Chapter 9 in Cloning,A Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory Press, 1989.

In the prior art, nucleic acid probes used in screening libraries areoften based upon the sequence of a known gene that is thought to behomologous to a gene that it is desired to isolate. The success of theprocedure depends upon the degree of homology between the probe and thetarget gene being sufficiently high. Probes based upon the sequences ofknown functional domains in proteins had limited value because, whilethe sequences of the functional domains were similar enough to allow fortheir recognition as shared domains, the similarity was not so high thatthe probes could be used to screen cDNA or genomic libraries for genescontaining the functional domains.

PCR may also be used to identify genes from genomic libraries. However,as in the case of using PCR to identify genes from cDNA libraries, thisrequires that oligonucleotides having sequence similarity to the desiredgene be available.

Using the screening methods provided by the present invention, DNAencoding proteins having a desired functional domain that would not bereadily identified by sequence homology can be identified by functionalbinding specificity to recognition units. By virtue of an ease inspecificity of binding requirements conferred by the screening methodsof the present invention, many novel, functionally homologous,functional domain-containing proteins can be identified. Although notintending to be bound by any mechanistic explanation, this ease inbinding specificity is believed to be the result of the use of amultivalent peptide recognition unit used to screen the gene library,preferably of a valency greater than bivalent, more preferablytetravalent or greater, and most preferably thestreptavidin-biotinylated peptide recognition unit complex.

In one particular embodiment of the invention, exhaustive screening ofproteins having a desired functional domain involves an iterativeprocess by which recognition units for SH3 domains identified in thefirst round of screening are used to detect SH3 domain-containingproteins in successive expression library screens (see FIG. 17). Thisstrategy enables one to search “sequence space” in what might be thoughtof as ever-widening circles with each successive cycle. This iterativestrategy can be initiated even when only one functionaldomain-containing protein and recognition unit are available.

This iterative process is not limited to proteins containing SH3domains. Members within a class of other functional domains also tend tohave overlapping, or at least similar recognition unit preferences, arestructurally stable, and often confer similar binding properties to awide variety of proteins. These characteristics predict that the methodsof the present invention will be applicable to a wide variety offunctional domain-containing proteins in addition to their applicabilityto SH3 domain-containing proteins.

5.1. Discovery of Novel Genes and Polypeptides Containing FunctionalDomains

The present invention provides methods for the identification of one ormore polypeptides (in particular, a “family” of polypeptides, includingthe target molecule) that contains a functional domain of interest thateither corresponds to or is the functional equivalent of a functionaldomain of interest present in a predetermined target molecule.

The present invention provides a mechanism for the rapid identificationof genes (e.g., cDNAs) encoding virtually any functional domain ofinterest. By screening cDNA libraries or other sources of polypeptidesfor recognition unit binding rather than sequence similarity, thepresent invention circumvents the limitations of conventional DNA-basedscreening methods and allows for the identification of highly disparateprotein sequences possessing equivalent functional activities. Theability to isolate entire repertoires of proteins containing particularmodular functional domains will prove invaluable both in molecularbiological investigations of the genome and in bringing new targets intodrug discovery programs.

It should likewise be apparent that a wide range of polypeptides havinga functional domain of interest can be identified by the process of theinvention, which process comprises:

(a) contacting a multivalent recognition unit complex with a pluralityof polypeptides; and

(b) identifying a polypeptide having a selective binding affinity forsaid recognition unit complex.

In a specific embodiment, the process comprises:

(a) contacting a multivalent recognition unit complex with a pluralityof polypeptides from which it is desired to identify a polypeptidehaving selective binding affinity for the recognition unit, in which thevalency of the recognition unit in the complex is at least two, or atleast four; and

(b) identifying, and preferably recovering, a polypeptide having aselective binding affinity for the recognition unit complex.

In another specific embodiment, the process comprises a method ofidentifying at least one polypeptide comprising a functional domain ofinterest, said method comprising:

(a) contacting one or more multivalent recognition unit complexes with aplurality of polypeptides; and

(b) identifying at least one polypeptide having selective bindingaffinity for at least one of said recognition unit complexes.

In another specific embodiment, the process comprises:

(a) contacting a multivalent recognition unit complex, which complexcomprises (i) avidin or streptavidin, and (ii) biotinylated recognitionunits, with a plurality of polypeptides from a cDNA expression library,in which the recognition unit is a peptide having in the range of 6 to60 amino acid residues; and

(b) identifying a polypeptide having a selective binding affinity forsaid recognition unit complex.

In another specific embodiment, the process comprises a method ofidentifying a polypeptide having an SH3 domain of interest comprising:

(a) contacting a multivalent recognition unit complex, which complexcomprises (i) avidin or streptavidin, and (ii) biotinylated recognitionunits, with a plurality of polypeptides from a cDNA expression library,in which the recognition unit is a peptide having in the range of 6 to60 amino acid residues and which selectively binds an SH3 domain; and

(b) identifying a polypeptide having a selective binding affinity forsaid recognition unit complex.

In another specific embodiment, the process comprises a method ofidentifying a polypeptide having a functional domain of interest or afunctional equivalent thereof comprising:

(a) screening a random peptide library to identify a peptide thatselectively binds a functional domain of interest; and

(b) screening a cDNA or genomic expression library with said peptide ora binding portion thereof to identify a polypeptide that selectivelybinds said peptide.

In a specific embodiment of the above method, the screening step (b) iscarried out by use of said peptide in the form of multiple antigenpeptides (MAP) or by use of said peptide cross-linked to bovine serumalbumin or keyhole limpet hemocyanin.

In another specific embodiment, the process comprises a method ofidentifying a polypeptide having a functional domain of interest or afunctional equivalent thereof comprising:

(a) screening a random peptide library to identify a plurality ofpeptides that selectively bind a functional domain of interest;

(b) determining at least part of the amino acid sequences of saidpeptides;

(c) determining a consensus sequence based upon the determined aminoacid sequences of said peptides; and

(d) screening a cDNA or genomic expression library with a peptidecomprising the consensus sequence to identify a polypeptide thatselectively binds said peptide.

In another specific embodiment, the process comprises a method ofidentifying a polypeptide having a functional domain of interest or afunctional equivalent thereof comprising:

(a) screening a random peptide library to identify a first peptide thatselectively binds a functional domain of interest;

(b) determining at least part of the amino acid sequence of said firstpeptide;

(c) searching a database containing the amino acid sequences of aplurality of expressed natural proteins to identify a protein containingan amino acid sequence homologous to the amino acid sequence of saidfirst peptide; and

(d) screening a cDNA or genomic expression library with a second peptidecomprising the sequence of said protein that is homologous to the aminoacid sequence of said first peptide.

The identified polypeptide identified by the above-described methodsthus should contain the functional domain of interest or a functionalequivalent thereof (that is, having a functional domain that isidentical, or having a functional domain that differs in sequence but iscapable of binding to the same recognition unit). In a particularembodiment, the polypeptide identified is a novel polypeptide. In apreferred embodiment, the recognition unit that is used to form themultvalent recognition unit complex is isolated or identified from arandom peptide library.

In a specific embodiment, the present invention provides amino acidsequences and DNA sequences encoding novel proteins containing SH3domains. The SH3 domains vary in sequence but retain binding specificityto an SH3 domain recognition unit. Also provided are fragments andderivatives of the novel proteins containing SH3 domains as well as DNAsequences encoding the same. It will be apparent to one of ordinaryskill in the art that also provided are proteins that vary slightly insequence from the novel proteins by virtue of conservative amino acidsubstitutions. It will also be apparent to one of ordinary skill in theart that the novel proteins may be expressed recombinantly by standardmethods. The novel proteins may also be expressed as fusion proteinswith a variety of other proteins, e.g., glutathione S-transferase.

The present invention provides a purified polypeptide comprising an SH3domain, said SH3 domain having an amino acid sequence selected from thegroup consisting of: SEQ ID NOs: 113-115, 118-121, 125-128, 133-139,204-218, and 219. Also provided is a purified DNA encoding thepolypeptide.

Also provided is a purified polypeptide comprising an SH3 domain, saidpolypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NOs: 8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40,190, 192, 194, 196, 198, 200, and 221. Also provided is a purified DNAencoding the polypeptide.

Also provided is a purified DNA encoding an SH3 domain, said DNA havinga sequence selected from the group consisting of SEQ ID NOs: 7, 9, 11,17, 19, 21, 23, 29, 31, 37, 39, 189, 191, 193, 195, 197, 199, and 220.Also provided is a nucleic acid vector comprising this purified DNA.Also provided is a recombinant cell containing this nucleic acid vector.

Also provided is a purified DNA encoding a polypeptide having an aminoacid sequence selected from the group consisting of: SEQ ID NOs: 8, 10,12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200, and221. Also provided is a nucleic acid vector comprising this purifiedDNA. Also provided is a recombinant cell containing this nucleic acidvector.

Also provided is a purified DNA encoding a polypeptide comprising anamino acid sequence selected from the group consisting of: SEQ IDNOs:113-115, 118-121, 125-128, 133-139, 204-218, and 219. Also providedis a nucleic acid vector comprising this purified DNA. Also provided isa recombinant cell containing this nucleic acid vector.

Also provided is a purified molecule comprising an SH3 domain of apolypeptide having an amino acid sequence selected from the groupconsisting of: SEQ ID NO: 8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40,190, 192, 194, 196, 198, 200, and 221.

Also provided is a fusion protein comprising (a) an amino acid sequencecomprising an SH3 domain of a polypeptide having the amino acid sequenceof SEQ ID NO: 8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194,196, 198, 200, and 221 joined via a peptide bond to (b) an amino acidsequence of at least six, or ten, or twenty amino acids from a differentpolypeptide. Also provided is a purified DNA encoding the fusionprotein. Also provided is a nucleic acid vector comprising the purifiedDNA encoding the fusion protein. Also provided is a recombinant cellcontaining this nucleic acid vector. Also provided is a method ofproducing this fusion protein comprising culturing a recombinant cellcontaining a nucleic acid vector encoding said fusion protein such thatsaid fusion protein is expressed, and recovering the expressed fusionprotein.

The present invention also provides a purified nucleic acid hybridizableto a nucleic acid having a sequence selected from the group consistingof: SEQ ID NOs: 7, 9, 11, 17, 19, 21, 23, 29, 31, 37, 39, 189, 191, 193,195, 197, 199, and 220.

The present invention also provides antibodies to a polypeptide havingan amino acid sequence selected from the group consisting of: SEQ IDNOs:113-115, 118-121, 125-128, 133-139, 204-218, and 219.

The present invention also provides antibodies to a polypeptide havingan amino acid sequence selected from the group consisting of SEQ ID NOs:8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200,and 221.

It is demonstrated by way of example herein that recognition units thatcomprise SH3 domain ligands derived from combinatorial peptide librariesmay be used in the methods of the present invention as probes for therapid discovery of novel proteins containing SH3 functional domains. Themethods of the present invention require no prior knowledge of thecharacteristics of a SH3 domain's natural cellular ligand to initiatethe process of discovery. One needs only enough purified SH3domain-containing protein (by way of example, 1-5 μg) to select peptidesfrom a random peptide library. In addition, because the methods of thepresent invention identify novel proteins from cDNA expression librariesbased only on their binding properties, low primary sequence identitybetween the target SH3 domain and the SH3 domains of the novel proteinsdiscovered need not be a limitation, provided some functional similaritybetween these SH3 domains is conserved. Also, the methods of the presentinvention are rapid, require inexpensive reagents, and employ simple andwell established laboratory techniques.

Using these methods, more than eighteen different SH3 domain-containingproteins have been identified, over half of which have not beenpreviously described. While certain of these previously unknown proteinsare clearly related to known genes such as amphiphysin and drebrin,others constitute new classes of signal transduction and/or cytoskeletalproteins. These include SH3P17 and SH3P18, two members of a new familyof adaptor-like proteins comprised of multiple SH3 domains; SH3 P12, anovel protein with three SH3 domains and a region similar to theextracellular peptide hormone sorbin; and SH3P4, SH3P8, and SH3P13,three members of a third new family of SH3-containing proteins. Thesenovel proteins are described more fully in Sections 6.1 and 6.1.1. Thehigh incidence of novel proteins identified by the methods of thepresent invention indicates that a large number of SH3 domain-containingproteins remain to be discovered by application of the methods of theinvention.

One of ordinary skill in the art would recognize that theabove-described novel proteins need not be used in their entirety in thevarious applications of those proteins described herein. In many casesit will be sufficient to employ that portion of the novel protein thatcontains the functional (e.g., SH3) domain. Such exemplary portions ofSH3 domain-containing proteins are shown in FIGS. 10A and 10B.Accordingly, the present invention provides derivatives (e.g., fragmentsand molecules comprising these fragments) of novel proteins that containSH3 domains, e.g., as shown in FIGS. 10A and 10B. Nucleic acids encodingthese fragments or other derivatives are also provided.

In another embodiment, the present invention includes a method ofidentifying one or more novel polypeptides having an SH3 domain, saidmethod comprising:

(a) identifying a recognition unit having a selective affinity for theSH3 domain by screening a peptide library with the SH3 domain;

(b) producing said recognition unit;

(c) contacting said recognition unit with a source of polypeptides; and

(d) identifying one or more novel polypeptides having a selectiveaffinity for said recognition unit, which polypeptides comprise the SH3domain.

5.1.1 Functional Domains

Functional domains of interest in the practice of the present inventioncan take many forms and may perform a variety of functions. For example,such functional domains may be involved in a number of cellular,biochemical, or physiological processes, such as cellular signaltransduction, transcriptional regulation, translational regulation, celladhesion, migration or transport, cytokine secretion and other aspectsof the immune response, and the like. In particular embodiments of thepresent invention, the functional domains of interest may consist ofregions known as SH1, SH2, SH3, PH, PTB, LIM, armadillo, andNotch/ankyrin repeat. See, e.g., Pawson, 1995, Nature 373:573-580; Cohenet al., 1995, Cell 80:237-248. Functional domains may also be chosenfrom among regions known as zinc fingers, leucine zippers, andhelix-turn-helix or helix-loop-helix. Certain functional domains may bebinding domains, such as DNA-binding domains or actin-binding domains.Still other functional domains may serve as sites of catalytic activity.

In one embodiment of the invention, a suitable target moleculecontaining the chosen functional domain of interest is selected. In thecase of an SH3 domain, for example, a number of proteins (or functionaldomain-containing derivatives or analogs thereof) may be selected as thetarget molecule, including but not limited to, the Src family ofproteins: Fyn, Lck, Lyn, Src, or Yes. Still other proteins contain anSH3 domain and can be used, including, but not limited to: Abl, Crk, Nck(other oncogenes), Grb2, PLCγ, RasGAP (proteins involved in signaltransduction), ABP-1, myosin-1, spectrin (proteins found in thecytoskeleton), and neutrophil NADPH oxidase (an enzyme). In the case ofa catalytic site, any catalytically active protein, such as an enzyme,can be used, particularly one whose catalytic site is known. Forexample, the catalytic site of the protein glutathione S-transferase(GST) can be used. Other target molecules that possess catalyticactivity may include, but are not limited to, protein serine/threoninekinases, protein tyrosine kinases, serine proteases, DNA or RNApolymerases, phospholipases, GTPases, ATPases, PI-kinases, DNAmethylases, metabolic enzymes, or protein glycosylases.

5.1.2. Recognition Units

By the phrase “recognition unit,” is meant any molecule having aselective affinity for the functional domain of the target molecule and,preferably, having a molecular weight of up to about 20,000 daltons. Ina particular embodiment of the invention, the recognition unit has amolecular weight that ranges from about 100 to about 10,000 daltons.

Accordingly, preferred recognition units of the present inventionpossess a molecular weight of about 100 to about 5,000 daltons,preferably from about 100 to about 2,000 daltons, and most preferablyfrom about 500 to about 1,500 daltons. As described further below, therecognition unit of the present invention can be a peptide, acarbohydrate, a nucleoside, an oligonucleotide, any small syntheticmolecule, or a natural product. When the recognition unit is a peptide,the peptide preferably contains about 6 to about 60 amino acid residues.

When the recognition unit is a peptide, the peptide can have less thanabout 140 amino acid residues; preferably, the peptide has less thanabout 100 amino acid residues; preferably, the peptide has less thanabout 70 amino acid residues; preferably, the peptide has 20 to 50 aminoacid residues; most preferably, the peptide has about 6 to 60 amino acidresidues.

The peptide recognition units are preferably in the form of amultivalent peptide complex comprising avidin or streptavidin(optionally conjugated to a label such as alkaline phosphatase orhorseradish peroxidase) and biotinylated peptides.

According to the present invention, a recognition unit (preferably inthe form of a multvalent recognition unit complex) is used to screen aplurality of expression products of gene sequences containing nucleicacid sequences that are present in native RNA or DNA (e.g., cDNAlibrary, genomic library).

The step of choosing a recognition unit can be accomplished in a numberof ways that are known to those of ordinary skill, including but notlimited to screening cDNA libraries or random peptide libraries for apeptide that binds to the functional domain of interest. See, e.g., Yuet al., 1994, Cell 76, 933-945; Sparks et al., 1994, J. Biol. Chem. 269,23853-23856. Alternatively, a peptide or other small molecule or drugmay be known to those of ordinary skill to bind to a certain targetmolecule and can be used. The recognition unit can even be synthesizedfrom a lead compound, which again may be a peptide, carbohydrate,oligonucleotide, small drug molecule, or the like. The recognition unitcan also be identified for use by doing searches (preferably viadatabase) for molecules having homology for other, known recognitionunit(s) having the ability to selectively bind to the functional domainof interest.

In a specific embodiment, the step of selecting a recognition unit foruse can be effected by, e.g., the use of diversity libraries, such asrandom or combinatorial peptide or nonpeptide libraries, which can bescreened for molecules that specifically bind to the functional domainof interest, e.g., an SH3 domain. Many libraries are known in the artthat can be used, e.g., chemically synthesized libraries, recombinant(e.g., phage display libraries), and in vitro translation-basedlibraries.

Examples of chemically synthesized libraries are described in Fodor etal., 1991, Science 251:767-773; Houghten et al., 1991, Nature 354:84-86;Lam et al., 1991, Nature 354:82-84; Medynski, 1994, Bio/Technology12:709-710; Gallop et al., 1994, J. Medicinal Chemistry 37(9):1233-1251;Ohlmeyer et al., 1993, Proc. Natl. Acad. Sci. USA 90:10922-10926; Erb etal., 1994, Proc. Natl. Acad. Sci. USA 91:11422-11426; Houghten et al.,1992, Biotechniques 13:412; Jayawickreme et al., 1994, Proc. Natl. Acad.Sci. USA 91:1614-1618; Salmon et al., 1993, Proc. Natl. Acad. Sci. USA90:11708-11712; PCT Publication No. WO 93/20242; and Brenner and Lerner,1992, Proc. Natl. Acad. Sci. USA 89:5381-5383.

Examples of phage display libraries are described in Scott and Smith,1990, Science 249:386-390; Devlin et al., 1990, Science, 249:404-406;Christian, R. B., et al., 1992, J. Mol. Biol. 227:711-718); Lenstra,1992, J. Immunol. Meth. 152:149-157; Kay et al., 1993, Gene 128:59-65;and PCT Publication No. WO 94/18318 dated Aug. 18, 1994.

In vitro translation-based libraries include but are not limited tothose described in PCT Publication No. WO 91/05058 dated Apr. 18, 1991;and Mattheakis et al., 1994, Proc. Natl. Acad. Sci. USA 91:9022-9026.

By way of examples of nonpeptide libraries, a benzodiazepine library(see e.g., Bunin et al., 1994, Proc. Natl. Acad. Sci. USA 91:4708-4712)can be adapted for use. Peptoid libraries (Simon et al., 1992, Proc.Natl. Acad. Sci. USA 89:9367-9371) can also be used. Another example ofa library that can be used, in which the amide functionalities inpeptides have been permethylated to generate a chemically transformedcombinatorial library, is described by Ostresh et al. (1994, Proc. Natl.Acad. Sci. USA 91:11138-11142).

The variety of non-peptide libraries that are useful in the presentinvention is great. For example, Ecker and Crooke, 1995, Bio/Technology13:351-360 list benzodiazapines, hydantoins, piperazinediones,biphenyls, sugar analogs, β-mercaptoketones, arylacetic acids,acylpiperidines, benzopyrans, cubanes, xanthines, aminimides, andoxazolones as among the chemical species that form the basis of variouslibraries.

Non-peptide libraries can be classified broadly into two types:decorated monomers and oligomers. Decorated monomer libraries employ arelatively simple scaffold structure upon which a variety of functionalgroups is added. Often the scaffold will be a molecule with a knownuseful pharmacological activity. For example, the scaffold might be thebenzodiazapine structure.

Non-peptide oligomer libraries utilize a large number of monomers thatare assembled together in a ways that create new shapes that depend onthe order of the monomers. Among the monomer units that have been usedare carbamates, pyrrolinones, and morpholinos. Peptoids, peptide-likeoligomers in which the side chain is attached to the α amino grouprather than the α carbon, form the basis of another version ofnon-peptide oligomer libraries. The first non-peptide oligomer librariesutilized a single type of monomer and thus contained a repeatingbackbone. Recent libraries have utilized more than one monomer, givingthe libraries added flexibility.

Screening the libraries can be accomplished by any of a variety ofcommonly known methods. See, e.g., the following references, whichdisclose screening of peptide libraries: Parmley and Smith, 1989, Adv.Exp. Med. Biol. 251:215-218; Scott and Smith, 1990, Science 249:386-390;Fowlkes et al., 1992; BioTechniques 13:422-427; Oldenburg et al., 1992,Proc. Natl. Acad. Sci. USA 89:5393-5397; Yu et al., 1994, Cell76:933-945; Staudt et al., 1988, Science 241:577-580; Bock et al., 1992,Nature 355:564-566; Tuerk et al., 1992, Proc. Natl. Acad. Sci. USA89:6988-6992; Ellington et al., 1992, Nature 355:850-852; U.S. Pat. No.5,096,815, U.S. Pat. No. 5,223,409, and U.S. Pat. No. 5,198,346, all toLadner et al.; Rebar and Pabo, 1993, Science 263:671-673; and PCTPublication No. WO 94/18318.

In a specific embodiment, screening to identify a recognition unit canbe carried out by contacting the library members with an SH3 domainimmobilized on a solid phase and harvesting those library members thatbind to the SH3 domain. Examples of such screening methods, termed“panning” techniques are described by way of example in Parmley andSmith, 1988, Gene 73:305-318; Fowlkes et al., 1992, BioTechniques13:422-427; PCT Publication No. WO 94/18318; and in references citedhereinabove.

In another embodiment, the two-hybrid system for selecting interactingproteins in yeast (Fields and Song, 1989, Nature 340:245-246; Chien etal., 1991, Proc. Natl. Acad. Sci. USA 88:9578-9582) can be used toidentify recognition units that specifically bind to SH3 domains.

Where the recognition unit is a peptide, the peptide can be convenientlyselected from any peptide library, including random peptide libraries,combinatorial peptide libraries, or biased peptide libraries. The term“biased” is used herein to mean that the method of generating thelibrary is manipulated so as to restrict one or more parameters thatgovern the diversity of the resulting collection of molecules, in thiscase peptides.

Thus, a truly random peptide library would generate a collection ofpeptides in which the probability of finding a particular amino acid ata given position of the peptide is the same for all 20 amino acids. Abias can be introduced into the library, however, by specifying, forexample, that a lysine occur every fifth amino acid or that positions 4,8, and 9 of a decapeptide library be fixed to include only arginine.Clearly, many types of biases can be contemplated, and the presentinvention is not restricted to any particular bias. Furthermore, thepresent invention contemplates specific types of peptide libraries, suchas phage-displayed peptide libraries and those that utilize a DNAconstruct comprising a lambda phage vector with a DNA insert.

As mentioned above, in the case of a recognition unit that is a peptide,the peptide may have about 6 to less than about 60 amino acid residues,preferably about 6 to about 25 amino acid residues, and most preferably,about 6 to about 15 amino acids. In another embodiment, a peptiderecognition unit has in the range of 20-100 amino acids, or 20-50 aminoacids. In the case of a bile acid receptor, for example, the recognitionunit may be a bile acid, such as cholic acid or cholesterol, and mayhave a molecular weight of about 300 to about 600. If the functionaldomain relates to transcriptional control, the recognition unit may be aportion of a transcriptional factor, which may bind to a region of agene of interest or to an RNA polymerase. The recognition unit may evenbe a nucleoside analog, such as cordycepin or the triphosphate thereof,capable of inhibiting RNA biosynthesis. The recognition unit may also bethe carbohydrate portion of a glycoprotein, which may have a selectiveaffinity for the asialoglycoprotein receptor, or the repeating glucanunit that exhibits a selective affinity for a cellulose binding domainor the active site of heparinase.

The selected recognition unit can be obtained by chemical synthesis orrecombinant expression. It is preferably purified prior to use inscreening a plurality of gene sequences.

5.1.3. Screening a Source of Polypeptides

After the recognition unit is chosen for use, the recognition unit isthen contacted with a plurality of polypeptides, preferably containing afunctional domain. In a particular embodiment of the invention, theplurality of polypeptides is obtained from a polypeptide expressionlibrary. The polypeptide expression library may be obtained, in turn,from cDNA, fragmented genomic DNA, and the like. In a specificembodiment, the library that is screened is a cDNA library of total polyA+ RNA of an organism, in general, or of a particular cell or tissuetype or developmental stage or disease condition or stage. Theexpression library may utilize a number of expression vehicles known tothose of ordinary skill, including but not limited to, recombinantbacteriophage, lambda phage, M13, a recombinant plasmid or cosmid, andthe like.

The plurality of polypeptides or the DNA sequences encoding same may beobtained from a variety of natural or unnatural sources, such as aprocaryotic or a eucaryotic cell, either a wild type, recombinant, ormutant. In particular, the plurality of polypeptides may be endogenousto microorganisms, such as bacteria, yeast, or fungi, to a virus, to ananimal (including mammals, invertebrates, reptiles, birds, and insects)or to a plant cell.

In addition, the plurality of polypeptides may be obtained from morespecific sources, such as the surface coat of a virion particle, aparticular cell lysate, a tissue extract, or they may be restricted tothose polypeptides that are expressed on the surface of a cell membrane.

Moreover, the plurality of polypeptides may be obtained from abiological fluid, particularly from humans, including but not limited toblood, plasma, serum, urine, feces, mucus, semen, vaginal fluid,amniotic fluid, or cerebrospinal fluid. The plurality of polypeptidesmay even be obtained from a fermentation broth or a conditioned medium,including all the polypeptide products secreted or produced by the cellspreviously in the broth or medium.

The step of contacting the recognition unit with the plurality ofpolypeptides may be effected in a number of ways. For example, one maycontemplate immobilizing the recognition unit on a solid support andbringing a solution of the plurality of polypeptides in contact with theimmobilized recognition unit. Such a procedure would be akin to anaffinity chromatographic process, with the affinity matrix beingcomprised of the immobilized recognition unit. The polypeptides having aselective affinity for the recognition unit can then be purified byaffinity selection. The nature of the solid support, process forattachment of the recognition unit to the solid support, solvent, andconditions of the affinity isolation or selection procedure would dependon the type of recognition unit in use but would be largely conventionaland well known to those of ordinary skill in the art. Moreover, thevalency of the recognition unit in the recognition unit complex used toscreen the polypeptides is believed to affect the specificity of thescreening step, and thus the valency can be chosen as appropriate inview of the desired specificity (see Sections 5.2 and 5.2.1).

Alternatively, one may also separate the plurality of polypeptides intosubstantially separate fractions comprising individual polypeptides. Forinstance, one can separate the plurality of polypeptides by gelelectrophoresis, column chromatography, or like method known to those ofordinary skill for the separation of polypeptides. The individualpolypeptides can also be produced by a transformed host cell in such away as to be expressed on or about its outer surface. Individualisolates can then be “probed” by the recognition unit, optionally in thepresence of an inducer should one be required for expression, todetermine if any selective affinity interaction takes place between therecognition unit and the individual clone. Prior to contacting therecognition unit with each fraction comprising individual polypeptides,the polypeptides can optionally first be transferred to a solid supportfor additional convenience. Such a solid support may simply be a pieceof filter membrane, such as one made of nitrocellulose or nylon.

In this manner, positive clones can be identified from a collection oftransformed host cells of an expression library, which harbor a DNAconstruct encoding a polypeptide having a selective affinity for therecognition unit. The polypeptide produced by the positive cloneincludes the functional domain of interest or a functional equivalentthereof. Furthermore, the amino acid sequence of the polypeptide havinga selective affinity for the recognition unit can be determined directlyby conventional means of amino acid sequencing, or the coding sequenceof the DNA encoding the polypeptide can frequently be determined moreconveniently by use of standard DNA sequencing methods. The primarysequence can then be deduced from the corresponding DNA sequence.

If the amino acid sequence is to be determined from the polypeptideitself, one may use microsequencing techniques. The sequencing techniquemay include mass spectroscopy.

In certain situations, it may be desirable to wash away any unboundrecognition unit from a mixture of the recognition unit and theplurality of polypeptides prior to attempting to determine or to detectthe presence of a selective affinity interaction (i.e., the presence ofa recognition unit that remains bound after the washing step) Such awash step may be particularly desirable when the plurality ofpolypeptides is bound to a solid support.

As can be anticipated, the degree of selective affinities observedvaries widely, generally falling in the range of about 1 nm to about 1mM. In preferred embodiments of the present invention, the selectiveaffinity is on the order of about 10 nM to about 100 μM, more preferablyon the order of about 100 nM to about 10 μM, and most preferably on theorder of about 100 nM to about 1 μM.

5.2. Specificity of Recognition Units

A particular recognition unit may have fairly generic selectivity for aseveral members (e.g., three or four or more) of a “panel” ofpolypeptides having the domain of interest (or different versions of thedomain of interest or functional equivalents of the domain of interest)or a fairly specific selectivity for only one or two, or possibly three,of the polypeptides among a “panel” of same. Furthermore, multiplerecognition units, each exhibiting a range of selectivities among a“panel” of polypeptides can be used to identify an increasinglycomprehensive set of additional polypeptides that include the functionaldomain of interest.

Hence, in a population of related polypeptides, the functional domainsof interest of each member may be schematically represented by a circle.See, by way of example, FIG. 7A. The circle of one polypeptide mayoverlap with that of another polypeptide. Such overlaps may be few ornumerous for each polypeptide. A particular recognition unit, A, mayrecognize or interact with a portion of the circle of a givenpolypeptide which does not overlap with any other circle. Such arecognition unit would be fairly specific to that polypeptide. On theother hand, a second recognition unit, B, may recognize a region ofoverlap between two or more polypeptides. Such a recognition unit wouldconsequently be less specific than the recognition unit A and may becharacterized as having a more generic specificity depending on thenumber of polypeptides that it recognizes or interacts with.

It should also be apparent to those of ordinary skill that any number ofB-type recognition units (B₁, B₂, B₃, etc.) can be present, eachrecognizing different “panels” of polypeptides. Hence, the use ofmultiple recognition units provides an increasingly more exhaustivepopulation of polypeptides, each of which exhibits a variation orevolution in the functional domain of interest present in the initialtarget molecule. It should also be apparent to one that the presentmethod can be applied in an iterative fashion, such that theidentification of a particular polypeptide can lead to the choice ofanother recognition unit. See, e.g., FIG. 7B. Use of this newrecognition unit will lead, in turn, to the identification of otherpolypeptides that contain functional domains of interest that enhancethe phenotypic and/or genotypic diversity of the population of “related”polypeptides.

Hence, with a given recognition unit, one may observe interaction withonly one or two different polypeptides. With other recognition units,one may find three, four, or more selective interactions. In thesituation in which only a single interaction is observed, it is likely,though not mandatory, that the selective affinity interaction is betweenthe recognition unit and a replica of the initial target molecule (or amolecule very similar structurally and “functionally” to the initialtarget molecule).

5.2.1. Effect of the Presentation of the Recognition Unit Complex on theSpecificity of the Recognition Unit-Functional Domain Interaction

The present inventors have found, unexpectedly, that the valency (i.e.,whether it is a monomer, dimer, tetramer, etc.) of the recognition unitthat is used to screen an expression library or other source ofpolypeptides apparently has a marked effect upon which genes orpolypeptides are identified from the expression library or source ofpolypeptides. In particular, the specificity of the recognitionunit-functional domain interaction appears to be affected by the valencyof the recognition unit in the screening process. By this specificity ismeant the selectivity in the functional domains to which the recognitionunit will bind in the screening step.

As discussed above, in one embodiment, recognition units are obtained byscreening a source of recognition units, e.g., a phage display library,for recognition units that bind to a particular target functionaldomain. Alternatively, database searches for recognition units withsequence homology to known recognition units can be employed. Of course,if a recognition unit for a particular target functional domain isalready known, there is no need to screen a library or other source ofrecognition units; one can merely synthesize that particular recognitionunit. The recognition unit, however obtained, is then used to screen anexpression library or other source of polypeptides, to identifypolypeptides that the recognition unit binds to. A recognition unit thatidentifies only its target functional domain is a recognition unit thatis completely specific. A recognition unit that identifies one or twoother polypeptides that do not contain identically the target functionaldomain, from among a plurality of polypeptides (e.g., of greater than10⁴, 10⁶, or 10⁸ complexity), in addition to identifying a moleculecomprising its target functional domain, is very or highly specific. Arecognition unit that identifies most other polypeptides present that donot contain its target functional domain, in addition to identifying itstarget functional domain, is a non-specific recognition unit. In betweenvery specific recognition units and non-specific recognition units, thepresent inventors have discovered that there are recognition units thatrecognize a small number of molecules having functional domains otherthan their target functional domains. These recognition units are saidto have generic specificity.

Thus, there is a “specificity continuum”, from completely and veryspecific through generic to non-specific, that a recognition unit mayevince. See FIG. 11 for a depiction of this specificity continuum. TheApplicants have discovered that a major factor influencing thespecificity exhibited by a recognition unit appears to be the valency ofthe recognition unit in the complex used to screen the expressionlibrary.

Usually, high specificity is considered to be desirable when screening alibrary. High specificity is exhibited, e.g., by affinity purifiedpolyclonal antisera which, in general, are very specific. Monoclonalantibodies are also very specific. Small peptides in monovalent form, onthe other hand, generally give very weak, non-specific signals when usedto screen a library; thus, they are considered to be non-specific.

The present inventors have discovered that recognition units in the formof small peptides, in multivalent form, have a specificity midwaybetween the high specificity of antibodies and the low/non-specificityof monovalent peptides. Multivalency of the recognition unit of at leasttwo, in a recognition unit complex used to screen the gene library, ispreferred, with a multivalency of at least four more preferred, toobtain a screening wherein specificity is eased but not forfeited. Inparticular, a multivalent (believed to be tetravalent) recognition unitcomplex comprising streptavidin or avidin (preferably conjugated to alabel, e.g., an enzyme such as alkaline phosphatase or horseradishperoxidase, or a fluorogen, e.g. green fluorescent protein) andbiotinylated peptide recognition units have an unexpected genericspecificity. This allows such peptides to be used to screen libraries toidentify classes of polypeptides containing functional domains that aresimilar but not identical to the peptides' target functional domains.These classes of polypeptides are identified despite the low level ofhomology at the amino acid level of the functional domains of themembers of the classes.

In another specific embodiment, multivalent peptide recognition unitsmay be in the form of multiple antigen peptides (MAP) (Tam, 1989, J.Imm. Meth. 124:53-61; Tam, 1988, Proc. Natl. Acad. Sci. USA85:5409-5413). In this form, the peptide recognition unit is synthesizedon a branching lysyl matrix using solid-phase peptide synthesis methods.Recognition units in the form of MAP may be prepared by methods known inthe art (Tam, 1989, J. Imm. Meth. 124:53-61; Tam, 1988, Proc. Natl.Acad. Sci. USA 85:5409-5413), or, for example, by a stepwise solid-phaseprocedure on MAP resins (Applied Biosystems), utilizing methodologyestablished by the manufacturer. MAP peptides may be synthesizedcomprising (recognition unit peptide)₂Lys₁, (recognition unitpeptide)₄Lys₃, (recognition unit peptide) ₅Lys₆ or more levels ofbranching.

The multivalent peptide recognition unit complexes may also be preparedby cross-linking the peptide to a carrier protein, e.g., bovine serumalbumin (BSA), keyhole limpet hemocyanin (KLH), or an enzyme, by use ofknown cross-linking reagents. Such cross-linked peptide recognitionunits may be detected by, e.g., an antibody to the carrier protein ordetection of the enzymatic activity of the carrier protein.

Furthermore, the present inventors have discovered what specificity isexhibited by various types of recognition units and their complexes,i.e., where these recognition units and their complexes fall in thespecificity continuum. The present inventors have discovered a range offormats for presenting recognition units used to screen libraries. Forexample, the present inventors have determined that a peptide in theform of a bivalent fusion protein with alkaline phosphatase is veryspecific. The same peptide in the form of a fusion protein with the pIIIprotein of an M13 derived bacteriophage, expressed on the phage surface,has somewhat less, though still high, specificity. That same peptidewhen biotinylated in the form of a tetravalent streptavidin-alkalinephosphatase complex has generic specificity. Use of such a genericallyspecific peptide permits the identification of a wide range of proteinsfrom expression libraries or other sources of polypeptides, each proteincontaining an example of a particular functional domain.

Accordingly, the present invention provides a method of modulating thespecificity of a peptide such that the peptide can be used as arecognition unit to screen a plurality of polypeptides, thus identifyingpolypeptides that have a functional domain. In a specific embodiment,specificity is generic so as to provide for the identification ofpolypeptides having a functional domain that varies in sequence fromthat of the target functional domain known to bind the recognition unitunder conditions of high specificity. In a particular embodiment, themethod comprises forming a tetravalent complex of the biotinylatedpeptide and streptavidin-alkaline phosphatase prior to use for screeningan expression library.

5.3. Kits

The present invention is also directed to an assay kit which can beuseful in the screening of drug candidates. In a particular embodimentof the present invention, an assay kit is contemplated which comprisesin one or more containers (a) a polypeptide containing a functionaldomain of interest; and (b) a recognition unit having a selectiveaffinity for the polypeptide. The kit optionally further comprises adetection means for determining the presence of apolypeptide-recognition unit interaction or the absence thereof.

In a specific embodiment, either the polypeptide containing thefunctional domain or the recognition unit is labeled. A wide range oflabels can be used to advantage in the present invention, including butnot limited to conjugating the recognition unit to biotin byconventional means. Alternatively, the label may comprise a fluorogen,an enzyme, an epitope, a chromogen, or a radionuclide. Preferably, thebiotin is conjugated by covalent attachment to either the polypeptide orthe recognition unit. The polypeptide or, preferably, the recognitionunit is immobilized on a solid support. The detection means employed todetect the label will depend on the nature of the label and can be anyknown in the art, e.g., film to detect a radionuclide; an enzymesubstrate that gives rise to a detectable signal to detect the presenceof an enzyme; antibody to detect the presence of an epitope, etc.

A further embodiment of the assay kit of the present invention includesthe use of a plurality of polypeptides, each polypeptide containing afunctional domain of interest. The assay kit further comprises at leastone recognition unit having a selective affinity for each of theplurality of polypeptides and a detection means for determining thepresence of a polypeptide-recognition unit interaction or the absencethereof.

A kit is provided that comprises, in one or more containers, a firstmolecule comprising an SH3 domain and a second molecule that binds tothe SH3 domain, i.e., a recognition unit, where the SH3 domain is anovel SH3 domain identified by the methods of the present invention.

In a specific embodiment, the present invention provides an assay kitcomprising in one or more containers:

(a) a purified polypeptide containing a functional domain of interest,in which the functional domain of is a domain selected from the groupconsisting of an SH1, SH2, SH3, PH, PTB, LIM, armadillo, Notch/ankyrinrepeat, zinc finger, leucine zipper, and helix-turn-helix; and

(b) a purified recognition unit having a selective binding affinity forsaid functional domain in said polypeptide.

In the above assay kit, the polypeptide may comprise an amino acidsequence selected from the group consisting of SEQ ID NOs: 8, 10, 12,18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200, 221,113-115, 118-121, 125-128, 133-139, 204-218, and 219.

In the above assay kit, the polypeptide may comprise an amino acidsequence selected from the group consisting of SEQ ID NOs:6, 14, 16, 26,28, 34, 36, 112, 116, 117, 122-124, 129-132, and 140.

In other embodiments of the above-described assay kit, the recognitionunit may be a peptide. The recognition unit may be labeled with e.g., anenzyme, an epitope, a chromogen, or biotin.

In another specific embodiment, the present invention provides an assaykit comprising in containers:

(a) a plurality of purified polypeptides, each polypeptide in a separatecontainer and each polypeptide containing a functional domain ofinterest in which the functional domain of interest is a domain selectedfrom the group consisting of an SH1, SH2, SH3, PH, PTB, LIM, armadillo,Notch/ankyrin repeat, zinc fingers, leucine zippers, andhelix-turn-helix; and

(b) at least one recognition unit having a selective binding affinityfor said functional domain in each of said plurality of polypeptides.

The present invention also provides an assay kit comprising in one ormore containers:

(a) a plurality of purified polypeptides, each polypeptide in a separatecontainer and each polypeptide containing an SH3 domain; and

(b) at least one peptide having a selective affinity for the SH3 domainin each of said plurality of polypeptides.

The present invention also provides a kit comprising a plurality ofpurified polypeptides comprising a functional domain of interest, eachpolypeptide in a separate container, and each polypeptide having afunctional domain of a different sequence but capable of displaying thesame binding specificity.

In the above-described kits, the polypeptides may have an amino acidsequence selected from the group consisting of: SEQ ID NOs:8, 10, 12,18, 26, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200, 221.

In the above-described kits, the functional domain may be an SH3 domain.

The molecular components of the kits are preferably purified.

The kits of the present invention may be used in the methods foridentifying new drug candidates and determining the specificitiesthereof that are described in Section 5.4.

5.4. Assays for the Identification of Potential Drug Candidates andDetermining the Specificity Thereof

The present invention also provides methods for identifying potentialdrug candidates (and lead compounds) and determining the specificitiesthereof. For example, knowing that a polypeptide with a functionaldomain of interest and a recognition unit, e.g., a binding peptide,exhibit a selective affinity for each other, one may attempt to identifya drug that can exert an effect on the polypeptide-recognition unitinteraction, e.g., either as an agonist or as an antagonist (inhibitor)of the interaction. With this assay, one can screen a collection ofcandidate “drugs” for the one exhibiting the most desiredcharacteristic, e.g., the most efficacious in disrupting the interactionor in competing with the recognition unit for binding to thepolypeptide.

Alternatively, one may utilize the different selectivities that aparticular recognition unit may exhibit for different polypeptidesbearing the same, similar, or functionally equivalent functionaldomains. Thus, one may tailor the screen to identify drug candidatesthat exhibit more selective activities directed to specificpolypeptide-recognition unit interactions, among the “panel” ofpossibilities. Thus, for example, a drug candidate may be screened toidentify the presence or absence of an effect on particular bindinginteractions, potentially leading to undesirable side effects.

Indeed, an intriguing application of the present invention is describedas follows. A known antiviral agent, FIAU (a halogenated nucleosideanalog), is effective at given dosages against the virus that causeshepatitis B. This compound is suspected of causing toxic side effects,however, which give rise to liver failure in certain patients to whomthe drug is administered. According to the present invention, an assayis provided which can be used to develop a new generation ofFIAU-derived drug that maintains its effectiveness against viralreplication while reducing liver toxicity. Such an assay is provided bychoosing FIAU as a recognition unit having a selective affinity for apolypeptide present in the hepatitis B virus or a cell infected with thevirus. This polypeptide or family of polypeptides having the functionaldomain of interest is obtained by allowing the chosen recognition unit,FIAU, to come into contact with an expression library comprised of thehepatitis B virus genome and/or a cDNA expression library of infectedcells, according to the methods of the present invention.

Likewise, the chosen recognition unit is allowed to come into contactwith a plurality of polypeptides obtained from a sample of a human liverextract or of noninfected hepatocytes. In this manner, a “panel” ofpolypeptides each of which exhibits a selective affinity for the chosenrecognition unit is identified. As described above, this panel is usedto determine the activities of drug (FIAU) homologs, analogs, orderivatives in terms of, say, selective inhibition of viralpolypeptide-FIAU interaction versus liver polypeptide-FIAU interaction.Hence, those drug homologs, analogs, or derivatives that maintain aselective affinity for the viral polypeptide (or infected cellpolypeptide) while failing to interact with or having a minimal bindingaffinity for liver polypeptides (and, hence, have reduced toxicity inthe liver due to elimination of undesirable molecular interactions) canbe identified and selected. Additional iterations of this process can beperformed if so desired.

Therefore, the present invention contemplates an assay for screening adrug candidate comprising: (a) allowing at least one polypeptidecomprising a functional domain of interest to come into contact with atleast one recognition unit having a selective affinity for thepolypeptide in the presence of an amount of a drug candidate, such thatthe polypeptide and the recognition unit are capable of interacting whenbrought into contact with one another in the absence of said drugcandidate, and in which the functional domain of interest is a domainselected from the group consisting of an SH1, SH2, SH3, PH, PTB, LIM,armadillo, Notch/ankyrin repeat, zinc finger, leucine zipper, andhelix-turn-helix; and (b) determining the effect, if any, of thepresence of the amount of the drug candidate on the interaction of thepolypeptide with the recognition unit.

In one embodiment, the effect of the drug candidate upon multiple,different interacting polypeptide-recognition unit pairs is determinedin which at least some of said polypeptides have a functional domainthat differs in sequence but is capable of displaying the same bindingspecificity as the functional domain in another of said polypeptides.

In another embodiment, at least one of said at least one polypeptide orrecognition unit contains a consensus functional domain and consensusrecognition unit, respectively.

In another embodiment, the drug candidate is an inhibitor of thepolypeptide-recognition unit interaction that is identified by detectinga decrease in the binding of polypeptide to recognition unit in thepresence of such inhibitor.

In another embodiment, said polypeptide is a polypeptide containing anSH3 domain produced by a method comprising:

(i) screening a peptide library with an SH3 domain to obtain one or morepeptides that bind the SH3 domain;

(ii) using one of the peptides from step (i) to screen a source ofpolypeptides to identify one or more polypeptides containing an SH3domain;

(iii) determining the amino acid sequence of the polypeptides identifiedin step (ii); and

(iv) producing the one or more novel polypeptides containing an SH3domain.

In another embodiment, said polypeptide is a polypeptide containing anSH3 domain produced by a method comprising:

(i) screening a peptide library with an SH3 domain to obtain a pluralityof peptides that bind the SH3 domain;

(ii) determining a consensus sequence for the peptides obtained in step(i);

(iii) producing a peptide comprising the consensus sequence;

(iv) using the peptide comprising the consensus sequence to screen asource of polypeptides to identify one or more polypeptides containingan SH3 domain;

(v) determining the amino acid sequence of the polypeptides identifiedin step (iv); and

(vi) producing the one or more polypeptides containing an SH3 domain.

In a preferred embodiment, the effect of the drug candidate uponmultiple, different interacting polypeptide-recognition unit pairs isdetermined in which preferably at least some (e.g., at least 2, 3, 4, 5,7, or 10) of said polypeptides have functional domains that vary insequence yet are capable of displaying the same binding specificity,i.e., binding to the same recognition unit. In another specificembodiment, at least one of said polypeptides and/or recognition unitscontain a consensus functional domain and recognition unit, respectively(and thus are not known to be naturally expressed proteins). In oneembodiment, the polypeptide is a novel polypeptide identified by themethods of the present invention. In a specific embodiment, an inhibitorof the polypeptide-recognition unit interaction is identified bydetecting a decrease in the binding of polypeptide to recognition unitin the presence of such inhibitor.

A common problem in the development of new drugs is that of identifyinga single, or a small number, of compounds that possess a desirablecharacteristic from among a background of a large number of compoundsthat lack that desired characteristic. This problem arises both in thetesting of compounds that are natural products from plant, animal, ormicrobial sources and in the testing of man-made compounds. Typically,hundreds, or even thousands, of compounds are randomly screened by theuse of in vitro assays such as those that monitor the compound's effecton some enzymatic activity, its ability to bind to a reference substancesuch as a receptor or other protein, or its ability to disrupt thebinding between a receptor and its ligand.

The compounds which pass this original screening test are known as“lead” compounds. These lead compounds are then put through furthertesting, including, eventually, in vivo testing in animals and humans,from which the promise shown by the lead compounds in the original invitro tests is either confirmed or refuted. See Remington'sPharmaceutical Sciences, 1990, A. R. Gennaro, ed., Chapter 8, pages60-62, Mack Publishing Co., Easton, Pa.; Ecker and Crooke, 1995,Bio/Technology 13:351-360.

There is a continual need for new compounds to be tested in the in vitroassays that make up the first testing step described above. There isalso a continual need for new assays by which the pharmacologicalactivities of these compounds may be tested. It is an object of thepresent invention to provide such new assays to determine whether acandidate compound is capable of affecting the binding between apolypeptide containing a functional domain and a recognition unit thatbinds to that functional domain. In particular, it is an object of thepresent invention to provide polypeptides, particularly novel ones,containing functional domains and their corresponding recognition unitsfor use in the above-described assays. The use of these polypeptidesgreatly expands the number of assays that may be used to screenpotential drug candidates for useful pharmacological activities (as wellas to identify potential drug candidates that display adverse orundesirable pharmacological activities). In one particular embodiment ofthe present invention, the polypeptides contain an SH3 domain.

In one embodiment of the present invention, such polypeptides areidentified by a method comprising: using a recognition unit that iscapable of binding to a predetermined functional domain to screen asource of polypeptides, thus identifying novel polypeptides containingthe functional domain or a similar functional domain.

In a particular embodiment of the above-described method, the novelpolypeptide comprises an SH3 domain and is obtained by:

(i) screening a peptide library with the SH3 domain to obtain one ormore peptides that bind the SH3 domain;

(ii) using one of the peptides from step (i), preferably in the form ofa multivalent complex, to screen a source of polypeptides to identifyone or more novel polypeptides containing SH3 domains;

(iii) determining the amino acid sequence of the polypeptides identifiedin step (ii); and

(iv) producing the one or more novel polypeptides containing SH3domains.

In another embodiment of the above-described method, the novelpolypeptide containing an SH3 domain is obtained by:

(i) screening a peptide library with the SH3 domain to obtain peptidesthat bind the SH3 domain;

(ii) determining a consensus sequence for the peptides obtained in step(i);

(iii) producing a peptide comprising the consensus sequence;

(iv) using the peptide comprising the consensus sequence to screen asource of polypeptides to identify one or more novel polypeptidescontaining SH3 domains;

(v) determining the amino acid sequence of the novel polypeptidesidentified in step (iv); and

(vi) producing the one or more novel polypeptides containing SH3domains.

one of ordinary skill in the art will recognize that it will not alwaysbe necessary to utilize the entire novel polypeptide containing the SH3domain in the assays described herein. Often, a portion of thepolypeptide that contains the SH3 domain will be sufficient, e.g., aglutathione S-transferase (GST)-SH3 domain fusion protein. See FIGS. 10Aand 10B for a depiction of the portions of the exemplary novelpolypeptides that contain SH3 domains.

A typical assay of the present invention consists of at least thefollowing components: (1) a molecule (e.g., protein or polypeptide)comprising a functional domain; (2) a recognition unit that selectivelybinds to the functional domain; (3) a candidate compound, suspected ofhaving the capacity to affect the binding between the protein containingthe functional domain and the recognition unit. The assay components mayfurther comprise (4) a means of detecting the binding of the proteincomprising the functional domain and the recognition unit. Such meanscan be e.g., a detectable label affixed to the protein comprising thefunctional domain, the recognition unit, or the candidate compound.

In a specific embodiment, the protein comprising the functional domainis a novel protein discovered by the methods of the present invention.

In another specific embodiment, the invention provides a method ofidentifying a compound that affects the binding of a molecule comprisinga functional domain and a recognition unit that selectively binds to thefunctional domain comprising:

(a) contacting the molecule comprising the functional domain and therecognition unit under conditions conducive to binding in the presenceof a candidate compound and measuring the amount of binding between themolecule and the recognition unit;

(b) comparing the amount of binding in step (a) with the amount ofbinding known or determined to occur between the molecule and therecognition unit in the absence of the candidate compound, where adifference in the amount of binding between step (a) and the amount ofbinding known or determined to occur between the molecule and therecognition unit in the absence of the candidate compound indicates thatthe candidate compound is a compound that affects the binding of themolecule comprising a functional domain and the recognition unit. In aspecific embodiment, the molecule comprising the functional domain is anovel protein discovered by the methods of the present invention. Inanother specific embodiment, the functional domain is an SH3 domain.

In one embodiment, the assay comprises allowing the polypeptidecontaining an SH3 domain to contact a recognition unit that selectivelybinds to the SH3 domain in the presence and in the absence of thecandidate compound under conditions such that binding of the recognitionunit to the protein containing an SH3 domain will occur unless thatbinding is disrupted or prevented by the candidate compound. Bydetecting the amount of binding of the recognition unit to the proteincontaining an SH3 domain in the presence of the candidate compound andcomparing that amount of binding to the amount of binding of therecognition unit to the protein or polypeptide containing an SH3 domainin the absence of the candidate compound, it is possible to determinewhether the candidate compound affects the binding and thus is a usefullead compound for the modulation of the activity of proteins containingthe SH3 domain. The effect of the candidate compound may be to eitherincrease or decrease the binding.

One version of an assay suitable for use in the present inventioncomprises binding the protein containing an SH3 domain to a solidsupport such as the wells of a microtiter plate. The wells contain asuitable buffer and other substances to ensure that conditions in thewells permit the binding of the protein or polypeptide containing an SH3domain to its recognition unit. The recognition unit and a candidatecompound are then added to the wells. The recognition unit is preferablylabeled, e.g., it might be biotinylated or labeled with a radioactivemoiety, or it might be linked to an enzyme, e.g., alkaline phosphatase.After a suitable period of incubation, the wells are washed to removeany unbound recognition unit and compound. If the candidate compounddoes not interfere with the binding of the protein or polypeptidecontaining an SH3 domain to the labeled recognition unit, the labeledrecognition unit will bind to the protein or polypeptide containing anSH3 domain in the well. This binding can then be detected. If thecandidate compound interferes with the binding of the protein orpolypeptide containing an SH3 domain and the labeled recognition unit,label will not be present in the wells, or will be present to a lesserdegree than is the case when compared to control wells that contain theprotein or polypeptide containing an SH3 domain and the labeledrecognition unit but to which no candidate compound is added. Of course,it is possible that the presence of the candidate compound will increasethe binding between the protein or polypeptide containing an SH3 domainand the labeled recognition unit. Alternatively, the recognition unitcan be affixed to a solid substrate during the assay. Functional domainsother than SH3 domains and their corresponding recognition units canalso be used.

In a specific embodiment of the above-described method, the protein orpolypeptide containing an SH3 domain is a novel protein or polypeptidecontaining an SH3 domain that has been identified by the methods of thepresent invention.

5.5. Use of Polypeptides Containing Functional Domains to DiscoverPolypeptides Involved in Pharmacological Activities

Using the methods of the present invention, it is possible to identifyand isolate large numbers of polypeptides containing functional domains,e.g., SH3 domains. Using these polypeptides, one can construct a matrixrelating the polypeptides to an array of candidate drug compounds. Forexample, Table 1 shows such a matrix.

TABLE 1 A B C D E F G H I J 1 2 X X X 3 4 5 X 6 7 X X 8 9 X 10

In Table 1, the columns headed by letters at the top of the tablerepresent different polypeptides containing SH3 domains (preferablynovel polypeptides identified by the methods of the invention). The rowsnumbered along the left side of the table represent recognition unitswith various specificity to SH3 domains. For each candidate drugcompound, a table such as Table 1 is generated from the results ofbinding assays. An X placed at the intersection of a particular numberedrow and lettered column represents a positive assay for binding, i.e.,the candidate drug compound affected the binding of the recognition unitof that particular row to the SH3 domain of that particular column.

Such data as that illustrated above is used to determine whethercandidate drug compounds display or are at risk of displaying desirableor undesirable physiological or pharmacological activities. For example,in Table 1, the drug compound inhibits the binding of recognition unit 2to the SH3 domains of polypeptides B, D, and H; the compound inhibitsthe binding of recognition unit 5 to the SH3 domain of polypeptide F;the compound inhibits the binding of recognition unit 7 to the SH3domains of polypeptides C and H; and the compound inhibits the bindingof recognition unit 9 to the SH3 domain of polypeptide A.

If interaction with polypeptide H leads to the desirable physiologicalor pharmacological activity, then this drug candidate might be a goodlead. However, interaction with polypeptides A, B, C, D, and F wouldneed to be evaluated for potential side effects.

As the maps are generated and pharmacological effects observed, the mapswill allow strategic assessment of the specificity necessary to obtainthe desired pharmacological effect. For example, if compounds 2 and 7are able to affect some pharmacological activity, while compounds 5 and9 do not affect that activity, then polypeptide H is likely to beinvolved in that pharmacological activity. For example, if compounds 2and 7 were both able to inhibit mast cell degranulation, while compounds5 and 9 did not, it is likely that polypeptide H is involved in mastcell degranulation.

Accordingly, the present invention provides a method of utilizing thepolypeptides comprising functional domains of the present invention inan assay to determine the participation of those polypeptides inpharmacological activities. In a particular embodiment, the polypeptidescomprise SH3 domains.

In another embodiment, the method comprises:

(a) contacting a drug candidate with a molecule comprising a functionaldomain under conditions conducive to binding, and detecting or measuringany specific binding that occurs; and

(b) repeating step (a) with a plurality of different molecules, eachcomprising a different functional domain but capable of binding to asingle predetermined recognition unit under appropriate conditions.

Preferably, at least one of said molecules is a novel polypeptideidentified by the methods of the present invention. In a specificembodiment, the molecules comprise the SH3 domains of Src, Abl,Cortactin, Phospholipase Cγ, Nck, Crk, p53 bp2, Amphiphysin, Grb2,RasGap, or Phosphatidyl-inositol 3′ kinase.

The present invention also provides a method of determining thepotential pharmacological activities of a molecule comprising:

(a) contacting the molecule with a compound comprising a functionaldomain under conditions conducive to binding;

(b) detecting or measuring any specific binding that occurs; and

(c) repeating steps (a) and (b) with a plurality of different compounds,each compound comprising a functional domain of different sequence butcapable of displaying the same binding specificity.

In a specific embodiment the functional domain is an SH3 domain.

In another embodiment, the compounds comprise the SH3 domains of Src,Abl, Cortactin, Phospholipase Cγ, Nck, Crk, p53 bp2, Amphiphysin, Grb2,RasGap, or Phosphatidyl-inositol 3′ kinase.

The present invention also provides a method of identifying a compoundthat affects the binding of a molecule comprising a functional domain toa recognition unit that selectively binds to the functional domaincomprising:

(a) contacting the molecule comprising the functional domain and therecognition unit under conditions conducive to binding in the presenceof a candidate compound and measuring the amount of binding between themolecule and the recognition unit and in which the functional domain ofinterest is a domain selected from the group consisting of an SH1, SH2,SH3, PH, PTB, LIM, armadillo, Notch/ankyrin repeat, zinc finger, leucinezipper, and helix-turn-helix;

(b) comparing the amount of binding in step (a) with the amount ofbinding known or determined to occur between the molecule and therecognition unit in the absence of the candidate compound, where adifference in the amount of binding between step (a) and the amount ofbinding known or determined to occur between the molecule and therecognition unit in the absence of the candidate compound indicates thatthe candidate compound is a compound that affects the binding of themolecule comprising a functional domain and the recognition unit.

In a specific embodiment, the functional domain is an SH3 domain.

5.6. Use of More Than One Recognition Unit Simultaneously

It has been found that when screening a source of polypeptides with arecognition unit, it is possible to use more than one recognition unitat the same time. In particular, it has been found that as many as fivedifferent recognition units may be used simultaneously to screen asource of polypeptides.

In particular, when the recognition units are biotinylated peptides andthe source of polypeptides is a cDNA expression library, the steps ofpreconjugation of the biotinylated peptides to streptavidin-alkalinephosphatase as well as the steps involved in screening the cDNAexpression library may be carried out in essentially the same manner asis done when a single biotinylated peptide is used as a recognitionunit. See Section 6.1 for details. The key difference when using morethan one biotinylated peptide at a time is that the peptides arecombined either before or at the step where they are placed in contactwith the polypeptides from which selection occurs.

In an embodiment employing a bacteriophage expression library to expressthe polypeptides, when the positive clones are worked up to the level ofisolated plaques, the clonal bacteriophage from the isolated plaques maybe tested against each of the biotinylated peptides individually, inorder to determine to which of the several peptides that were used asrecognition units in the primary screen the phage are actually binding.

5.7. Use of Recognition Units from Known Amino Acid Sequences

In many cases it may not be necessary to screen a collection ofsubstances, e.g., a peptide library, in order to obtain a recognitionunit for a given functional domain. In the case of peptide recognitionunits, for example, it is sometimes possible to identify a recognitionunit by inspection of known amino acid sequences. Stretches of theseamino acid sequences that resemble known binding sequences for thefunctional domain can be synthesized and screened against a source ofpolypeptides in order to obtain a plurality of polypeptides comprisingthe given functional domain.

Prior to the disclosure of the present invention of methods of preparingrecognition units having generic specificity, it would have been thoughtfruitless to pursue this approach. The expectation would have been thata recognition unit, chosen from published amino acid sequences asdescribed above, would have been useful, at best, to identify a singleprotein containing a functional domain.

5.8. Isolation and Expression of Nucleic Acids Encoding PolypeptidesComprising a Functional Domain

In particular aspects, the invention provides amino acid sequences ofpolypeptides comprising functional domains, preferably humanpolypeptides, and fragments and derivatives thereof which comprise anantigenic determinant (i.e., can be recognized by an antibody) or whichare functionally active, as well as nucleic acid sequences encoding theforegoing. “Functionally active” material as used herein refers to thatmaterial displaying one or more functional activities, e.g., abiological activity, antigenicity (capable of binding to an antibody)immunogenicity, or comprising a functional domain that is capable ofspecific binding to a recognition unit. In specific embodiments, theinvention provides fragments of polypeptides comprising a functionaldomain consisting of at least 40 amino acids, or of at least 75 aminoacids. Nucleic acids encoding the foregoing are provided. Functionalfragments of at least 10 or 20 amino acids are also provided.

In other specific embodiments, the invention provides nucleotidesequences and subsequences encoding polypeptides comprising a functionaldomain, preferably human polypeptides, consisting of at least 25nucleotides, at least 50 nucleotides, or at least 150 nucleotides.Nucleic acids encoding fragments of the polypeptides comprising afunctional domain are provided, as well as nucleic acids complementaryto and capable of hybridizing to such nucleic acids. In one embodiment,such a complementary sequence may be complementary to a cDNA sequenceencoding a polypeptide comprising a functional domain of at least 25nucleotides, or of at least 100 nucleotides. In a preferred aspect, theinvention utilizes cDNA sequences encoding human polypeptides comprisinga functional domain or a portion thereof.

Any eukaryotic cell can potentially serve as the nucleic acid source forthe molecular cloning of polypeptides comprising a functional domain.The DNA may be obtained by standard procedures known in the art (e.g., aDNA “library”) by cDNA cloning, or by the cloning of genomic DNA, orfragments thereof, purified from the desired cell (see, for exampleSambrook et al., 1989, Molecular Cloning, A Laboratory Manual, ColdSpring Harbor Laboratory, 2d. Ed., Cold Spring Harbor, N.Y.; Glover, D.M. (ed.), 1985, DNA Cloning: A Practical Approach, MRL Press, Ltd.,Oxford, U.K. Vol. I, II.) Clones derived from genomic DNA may containregulatory and intron DNA regions in addition to coding regions; clonesderived from cDNA will contain only exon sequences. Whatever the source,the gene encoding a polypeptide comprising a functional domain should bemolecularly cloned into a suitable vector for propagation of the gene.

In the molecular cloning of the gene from genomic DNA, DNA fragments aregenerated, some of which will encode the desired gene. The DNA may becleaved at specific sites using various restriction enzymes.Alternatively, one may use DNAse in the presence of manganese tofragment the DNA, or the DNA can be physically sheared, as for example,by sonication. The linear DNA fragments can then be separated accordingto size by standard techniques, including but not limited to, agaroseand polyacrylamide gel electrophoresis and column chromatography.

Once a gene encoding a particular polypeptide comprising a functionaldomain has been isolated from a first species, it is a routine matter toisolate the corresponding gene from another species identification ofthe specific DNA fragment from another species containing the desiredgene may be accomplished in a number of ways. For example, if an amountof a portion of a gene or its specific RNA from the first species, or afragment thereof e.g., the functional domain, is available and can bepurified and labeled, the generated DNA fragments from another speciesmay be screened by nucleic acid hybridization to the labeled probe(Benton, W. and Davis, R., 1977, Science 196, 180; Grunstein, M. AndHogness, D., 1975, Proc. Natl. Acad. Sci. U.S.A. 72, 3961). Those DNAfragments with substantial homology to the probe will hybridize. In apreferred embodiment, PCR using primers that hybridize to a knownsequence of a gene of one species can be used to amplify the homolog ofsuch gene in a different species. The amplified fragment can then beisolated and inserted into an expression or cloning vector. It is alsopossible to identify the appropriate fragment by restriction enzymedigestion(s) and comparison of fragment sizes with those expectedaccording to a known restriction map if such is available. Furtherselection can be carried out on the basis of the properties of the gene.Alternatively, the presence of the gene may be detected by assays basedon the physical, chemical, or immunological properties of its expressedproduct. For example, cDNA clones, or DNA clones which hybrid-select theproper mRNAs, can be selected which produce a protein that, e.g., hassimilar or identical electrophoretic migration, isolectric focusingbehavior, proteolytic digestion maps, in vitro aggregation activity(“adhesiveness”) or antigenic properties as known for the particularpolypeptide comprising a functional domain from the first species. If anantibody to that particular polypeptide is available, correspondingpolypeptide from another species may be identified by binding of labeledantibody to the putatively polypeptide synthesizing clones, in an ELISA(enzyme-linked immunosorbent assay)-type procedure.

Genes encoding polypeptides comprising a functional domain can also beidentified by mRNA selection by nucleic acid hybridization followed byin vitro translation. In this procedure, fragments are used to isolatecomplementary mRNAs by hybridization. Such DNA fragments may representavailable, purified DNA of genes encoding polypeptides comprising afunctional domain of a first species. Immunoprecipitation analysis orfunctional assays (e.g., ability to bind to a recognition unit) of thein vitro translation products of the isolated mRNAs identifies the mRNAand, therefore, the complementary DNA fragments that contain the desiredsequences. In addition, specific mRNAs may be selected by adsorption ofpolysomes isolated from cells to immobilized antibodies specificallydirected against polypeptides comprising a functional domain. Aradiolabelled cDNA of a gene encoding a polypeptide comprising afunctional domain can be synthesized using the selected mRNA (from theadsorbed polysomes) as a template. The radiolabelled mRNA or cDNA maythen be used as a probe to identify the DNA fragments that represent thegene encoding the polypeptide comprising a functional domain of anotherspecies from among other genomic DNA fragments. In a specificembodiment, human homologs of mouse genes are obtained by methodsdescribed above. In various embodiments, the human homolog ishybridizable to the mouse homolog under conditions of low, moderate, orhigh stringency. By way of example and not limitation, procedures usingsuch conditions of low stringency are as follows (see also Shilo andWeinberg, 1981, Proc. Natl. Acad. Sci. USA 78:6789-6792): Filterscontaining DNA are pretreated for 6 h at 40° C. in a solution containing35% formamide, 5×SSC, 50 mM. Tris-HCl (pH 7.5), 5 mM EDTA, 0.1% PVP,0.1% Ficoll, 1% BSA, and 500 μg/ml denatured salmon sperm DNA.Hybridizations are carried out in the same solution with the followingmodifications: 0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 μg/ml salmon spermDNA, 10% (wt/vol) dextran sulfate, and 5-20×10⁶ cpm ³²P-labeled probe isused. Filters are incubated in hybridization mixture for 18-20 h at 40°C., and then washed for 1.5 h at 55° C. in a solution containing 2×SSC,25 mM Tris-HCl (pH 7.4), 5 mM EDTA, and 0.1% SDS. The wash solution isreplaced with fresh solution and incubated an additional 1.5 h at 60° C.Filters are blotted dry and exposed for autoradiography. If necessary,filters are washed for a third time at 65-68° C. and reexposed to film.Other conditions of low stringency which may be used are well known inthe art (e.g., as employed for cross-species hybridizations).

By way of example and not limitation, procedures using conditions ofhigh stringency are as follows: Prehybridization of filters containingDNA is carried out for 8 h to overnight at 65° C. in buffer composed of6×SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll,0.02% BSA, and 500 μg/ml denatured salmon sperm DNA. Filters arehybridized for 48 h at 65° C. in prehybridization mixture containing 100μg/ml denatured salmon sperm DNA and 5-20×10⁶ cpm of ³²P-labeled probe.Washing of filters is done at 37° C. for 1 h in a solution containing2×SSC, 0.01% PVP, 0.01% Ficoll, and 0.01% BSA. This is followed by awash in 0.1×SSC at 50° C. for 45 min before autoradiography. Otherconditions of high stringency which may be used are well known in theart.

The identified and isolated gene encoding a polypeptide comprising afunctional domain can then be inserted into an appropriate cloningvector. A large number of vector-host systems known in the art may beused. Possible vectors include, but are not limited to, plasmids ormodified viruses, but the vector system must be compatible with the hostcell used. Such vectors include, but are not limited to, bacteriophagessuch as lambda derivatives, or plasmids such as PBR322 or pUC plasmidderivatives. The insertion into a cloning vector can, for example, beaccomplished by ligating the DNA fragment into a cloning vector whichhas complementary cohesive termini. However, if the complementaryrestriction sites used to fragment the DNA are not present in thecloning vector, the ends of the DNA molecules may be enzymaticallymodified. Alternatively, any site desired may be produced by ligatingnucleotide sequences (linkers) onto the DNA termini; these ligatedlinkers may comprise specific chemically synthesized oligonucleotidesencoding restriction endonuclease recognition sequences. In analternative method, the cleaved vector and gene may be modified byhomopolymeric tailing. Recombinant molecules can be introduced into hostcells via transformation, transfection, infection, electroporation,etc., so that many copies of the gene sequence are generated.

In an alternative method, the desired gene may be identified andisolated after insertion into a suitable cloning vector in a “shot gun”approach. Enrichment for the desired gene, for example, by sizefractionization, can be done before insertion into the cloning vector.

In specific embodiments, transformation of host cells with recombinantDNA molecules that incorporate the isolated gene, cDNA, or synthesizedDNA sequence enables generation of multiple copies of the gene. Thus,the gene may be obtained in large quantities by growing transformants,isolating the recombinant DNA molecules from the transformants and, whennecessary, retrieving the inserted gene from the isolated recombinantDNA.

The nucleic acid coding for a polypeptide comprising a functional domainof the invention can be inserted into an appropriate expression vector,i.e., a vector which contains the necessary elements for thetranscription and translation of the inserted protein-coding sequence.The necessary transcriptional and translational signals can also besupplied by the native gene encoding the polypeptide and/or its flankingregions. A variety of host-vector systems may be utilized to express theprotein-coding sequence. These include but are not limited to mammaliancell systems infected with virus (e.g., vaccinia virus, adenovirus,etc.); insect cell systems infected with virus (e.g., baculovirus);microorganisms such as yeast containing yeast vectors, or bacteriatransformed with bacteriophage, DNA, plasmid DNA, or cosmid DNA. Theexpression elements of vectors vary in their strengths andspecificities. Depending on the host-vector system utilized, any one ofa number of suitable transcription and translation elements may be used.

Any of the methods previously described for the insertion of DNAfragments into a vector may be used to construct expression vectorscontaining a chimeric gene consisting of appropriatetranscriptional/translational control signals and the protein codingsequences. These methods may include in vitro recombinant DNA andsynthetic techniques and in vivo recombinants (genetic recombination).Expression of nucleic acid sequence encoding a protein or peptidefragment may be regulated by a second nucleic acid sequence so that theprotein or peptide is expressed in a host transformed with therecombinant DNA molecule. For example, expression of a protein may becontrolled by any promoter/enhancer element known in the art. Promoterswhich may be used to control gene expression include, but are notlimited to, the SV40 early promoter region (Benoist and Chambon, 1981,Nature 290, 304-310), the promoter contained in the 3′ long terminalrepeat of Rous sarcoma virus (Yamamoto, et al., 1980, Cell 22, 787-797),the herpes thymidine kinase promoter (Wagner et al., 1981, Proc. Natl.Acad. Sci. U.S.A. 78, 1441-1445), the regulatory sequences of themetallothionein gene. (Brinster et al., 1982, Nature 296, 39-42);prokaryotic expression vectors such as the β-lactamase promoter(Villa-Kamaroff, et al., 1978, Proc. Natl. Acad. Sci. U.S.A. 75,3727-3731), or the tac promoter (DeBoer, et al., 1983, Proc. Natl. Acad.Sci. U.S.A. 80, 21-25); see also “Useful proteins from recombinantbacteria” in Scientific American, 1980, 242, 74-94; plant expressionvectors comprising the nopaline synthetase promoter region(Herrera-Estrella et al., Nature 303, 209-213) or the cauliflower mosaicvirus 35S RNA promoter (Gardner, et al., 1981, Nucl. Acids Res. 9,2871), and the promoter of the photosynthetic enzyme ribulosebiphosphate carboxylase (Herrera-Estrella et al., 1984, Nature 310,115-120); promoter elements from yeast or other fungi such as the Gal 4promoter, the ADC (alcohol dehydrogenase) promoter, PGK (phosphoglycerolkinase) promoter, alkaline phosphatase promoter, and the followinganimal transcriptional control regions, which exhibit tissue specificityand have been utilized in transgenic animals: elastase I gene controlregion which is active in pancreatic acinar cells (Swift et al., 1984,Cell 38, 639-646; Ornitz et al., 1986, Cold Spring Harbor Symp. Quant.Biol. 50, 399-409; MacDonald, 1987, Hepatology 7, 425-515); insulin genecontrol region which is active in pancreatic beta cells (Hanahan, 1985,Nature 315, 115-122), immunoglobulin gene control region which is activein lymphoid cells (Grosschedl et al., 1984, Cell 38, 647-658; Adames etal., 1985, Nature 318, 533-538; Alexander et al., 1987, Mol. Cell. Biol.7, 1436-1444), mouse mammary tumor virus control region which is activein testicular, breast, lymphoid and mast cells (Leder et al., 1986, Cell45, 485-495), albumin gene control region which is active in liver(Pinkert et al., 1987, Genes and Devel. 1, 268-276), alpha-fetoproteingene control region which is active in liver (Krumlauf et al., 1985,Mol. Cell. Biol. 5, 1639-1648; Hammer et al., 1987, Science 235, 53-58;alpha 1-antitrypsin gene control region which is active in the liver(Kelsey et al., 1987, Genes and Devel. 1, 161-171), beta-globin genecontrol region which is active in myeloid cells (Mogram et al., 1985,Nature 315, 338-340; Kollias et al., 1986, Cell 46, 89-94; myelin basicprotein gene control region which is active in oligodendrocyte cells inthe brain (Readhead et al., 1987, Cell 48, 703-712); myosin lightchain-2 gene control region which is active in skeletal muscle (Sani,1985, Nature 314, 283-286), and gonadotropic releasing hormone genecontrol region which is active in the hypothalamus (Mason et al., 1986,Science 234, 1372-1378).

Expression vectors containing inserts of genes encoding polypeptidescomprising a functional domain can be identified by three generalapproaches: (a) nucleic acid hybridization, (b) presence or absence of“marker” gene functions, and (c) expression of inserted sequences. Inthe first approach, the presence of a foreign gene inserted in anexpression vector can be detected by nucleic acid hybridization usingprobes comprising sequences that are homologous to the inserted gene. Inthe second approach, the recombinant vector/host system can beidentified and selected based upon the presence or absence of certain“marker” gene functions (e.g., thymidine kinase activity, resistance toantibiotics, transformation phenotype, occlusion body formation inbaculovirus, etc.) caused by the insertion of foreign genes in thevector. For example, if the gene encoding a polypeptide comprising afunctional domain is inserted within the marker gene sequence of thevector, recombinants containing the gene can be identified by theabsence of the marker gene function. In the third approach, recombinantexpression vectors can be identified by assaying the foreign geneproduct expressed by the recombinant. Such assays can be based, forexample, on the physical or functional properties of the gene product inin vitro assay systems, e.g., ability to bind to recognition units.

Once a particular recombinant DNA molecule is identified and isolated,several methods known in the art may be used to propagate it. Once asuitable host system and growth conditions are established, recombinantexpression vectors can be propagated and prepared in quantity. Aspreviously explained, the expression vectors which can be used include,but are not limited to, the following vectors or their derivatives:human or animal viruses such as vaccinia virus or adenovirus; insectviruses such as baculovirus; yeast vectors; bacteriophage vectors (e.g.,lambda), and plasmid and cosmid DNA vectors, to name but a few.

In addition, a host cell strain may be chosen which modulates theexpression of the inserted sequences, or modifies and processes the geneproduct in the specific fashion desired. Expression from certainpromoters can be elevated in the presence of certain inducers; thus,expression of the protein may be controlled. Furthermore, different hostcells have characteristic and specific mechanisms for the translationaland post-translational processing and modification (e.g., glycosylation,cleavage) of proteins. Appropriate cell lines or host systems can bechosen to ensure the desired modification and processing of the foreignprotein expressed. For example, expression in a bacterial system can beused to produce an unglycosylated core protein product. Expression inyeast will produce a glycosylated product. Expression in mammalian cellscan be used to ensure “native” glycosylation of a heterologous protein.Furthermore, different vector/host expression systems may effectprocessing reactions such as proteolytic cleavages to different extents.

In other specific embodiments, polypeptides comprising a functionaldomain, or fragments, analogs, or derivatives thereof may be expressedas a fusion, or chimeric protein product (comprising the polypeptide,fragment, analog, or derivative joined via a peptide bond to aheterologous protein sequence (of a different protein)). Such a chimericproduct can be made by ligating the appropriate nucleic acid sequencesencoding the desired amino acid sequences to each other by methods knownin the art, in the proper reading frame, and expressing the chimericproduct by methods commonly known in the art. Alternatively, such achimeric product may be made by protein synthetic techniques, e.g., byuse of a peptide synthesizer.

5.8.1 Identification and Purification of the Expressed Gene Product

Once a recombinant which expresses the gene sequence encoding apolypeptide comprising a functional domain is identified, the geneproduct may be analyzed. This can be achieved by assays based on thephysical or functional properties of the product, including radioactivelabelling of, the product followed by analysis by gel electrophoresis.

Once the polypeptide comprising a functional domain is identified, itmay be isolated and purified by standard methods includingchromatography (e.g., ion exchange, affinity, and sizing columnchromatography), centrifugation, differential solubility, or by anyother standard technique for the purification of proteins. Thefunctional properties may be evaluated using any suitable assay,including, but not limited to, binding to a recognition unit.

5.9 Derivatives and Analogs of Polypeptides Comprising a FunctionalDomain

The invention further provides derivatives (including but not limited tofragments) and analogs of polypeptides that are functionally active,e.g., comprising a functional domain. In a specific embodiment, thederivative or analog is functionally active, i.e., capable of exhibitingone or more functional activities associated with a full-length,wild-type polypeptide, e.g., binding to a recognition unit. As oneexample, such derivatives or analogs may have the antigenicity of thefull-length polypeptide.

In particular, derivatives can be made by altering gene sequencesencoding polypeptides comprising a functional domain by substitutions,additions or deletions that provide for functionally equivalentmolecules. Due to the degeneracy of nucleotide coding sequences, otherDNA sequences which encode substantially the same amino acid sequence asa gene encoding a polypeptide comprising a functional domain may be usedin the practice of the present invention. These include but are notlimited to nucleotide sequences comprising all or portions of such geneswhich are altered by the substitution of different codons that encode afunctionally equivalent amino acid residue within the sequence, thusproducing a silent change. Likewise, the derivatives of the inventioninclude, but are not limited to, those containing, as a primary aminoacid sequence, all or part of the amino acid sequence of a polypeptidecomprising a functional domain including altered sequences in whichfunctionally equivalent amino acid residues are substituted for residueswithin the sequence resulting in a silent change. For example, one ormore amino acid residues within the sequence can be substituted byanother amino acid of a similar polarity which acts as a functionalequivalent, resulting in a silent alteration. Substitutes for an aminoacid within the sequence may be selected from other members of the classto which the amino acid belongs. For example, the nonpolar (hydrophobic)amino acids include alanine, leucine, isoleucine, valine, proline,phenylalanine, tryptophan and methionine. The polar neutral amino acidsinclude glycine, serine, threonine, cysteine, tyrosine, asparagine, andglutamine. The positively charged (basic) amino acids include arginine,lysine and histidine. The negatively charged (acidic) amino acidsinclude aspartic acid and glutamic acid.

Derivatives or analogs of genes encoding polypeptides comprising afunctional domain include but are not limited to those polypeptideswhich are substantially homologous to the genes or fragments thereof, orwhose encoding nucleic acid is capable of hybridizing to a nucleic acidsequence of the genes.

The derivatives and analogs of the invention can be produced by variousmethods known in the art. The manipulations which result in theirproduction can occur at the gene or protein level. For example, thecloned gene sequence can be modified by any of numerous strategies knownin the art (Maniatis, T., 1989, Molecular Cloning, A Laboratory Manual,2d ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.). Thesequence can be cleaved at appropriate sites with restrictionendonuclease(s), followed by further enzymatic modification if desired,isolated, and ligated in vitro. PCR primers can be constructed so as tointroduce desired sequence changes during PCR amplification of a nucleicacid encoding the desired polypeptide. In the production of the geneencoding a derivative or analog, care should be taken to ensure that themodified gene remains within the same translational reading frame,uninterrupted by translational stop signals, in the gene region wherethe desired activity is encoded.

Additionally, the sequence of the genes encoding polypeptides comprisinga functional domain can be mutated in vitro or in vivo, to create and/ordestroy translation, initiation, and/or termination sequences, or tocreate variations in coding regions and/or form new restrictionendonuclease sites or destroy preexisting ones, to facilitate further invitro modification. Any technique for mutagenesis known in the art canbe used, including but not limited to, in vitro site-directedmutagenesis (Hutchinson, C., et al., 1978, J. Biol. Chem. 253:6551), useof TAB® linkers (Pharmacia), etc.

Manipulations of the sequence may also be made at the protein level.Included within the scope of the invention are protein fragments orother derivatives or analogs which are differentially modified during orafter translation, e.g., by glycosylation, acetylation, phosphorylation,amidation, derivatization by known protecting/blocking groups,proteolytic cleavage, linkage to an antibody molecule or other cellularligand, etc. Any of numerous chemical modifications may be carried outby known techniques, including but not limited to specific chemicalcleavage by cyanogen bromide, trypsin, chymotrypsin, papain, V8protease, NaBH₄; acetylation, formulation, oxidation, reduction;metabolic synthesis in the presence of tunicamycin; etc.

In addition, analogs and derivatives can be chemically synthesized. Forexample, a peptide corresponding to a portion of a polypeptidecomprising a functional domain can be synthesized by use of a peptidesynthesizer.

Furthermore, if desired, nonclassical amino acids or chemical amino acidanalogs can be introduced as a substitution or addition into thesequence. Non-classical amino acids include but are not limited to theD-isomers of the common amino acids, α-amino isobutyric acid,4-aminobutyric acid, hydroxyproline, sarcosine, citrulline, cysteicacid, t-butylglycine, t-butylalanine, phenylglycine, cyclohexylalanine,β-alanine, designer amino acids such as β-methyl amino acids, Cα-methylamino acids, and Nα-methyl amino acids.

5.10 Antibodies to Polypeptides Comprising a Functional Domain

According to one embodiment, the invention provides antibodies andfragments thereof containing the binding domain thereof, directedagainst polypeptides comprising a functional domain. Accordingly,polypeptides comprising a functional domain, fragments or analogs orderivatives thereof, in particular, may be used as immunogens togenerate antibodies against such polypeptides, fragments or analogs orderivatives. Such antibodies can be polyclonal, monoclonal, chimeric,single chain, Fab fragments, or from an Fab expression library. In aspecific embodiment, antibodies specific to the functional domain of apolypeptide comprising a functional domain may be prepared.

Various procedures known in the art may be used for the production ofpolyclonal antibodies. In a particular embodiment, rabbit polyclonalantibodies to an epitope of a polypeptide comprising a functionaldomain, or a subsequence thereof, can be obtained. For the production ofantibody, various host animals can be immunized by injection with thenative polypeptide comprising a functional domain, or a syntheticversion, or fragment thereof, including but not limited to rabbits,mice, rats, etc. Various adjuvants may be used to increase theimmunological response, depending on the host species, and including butnot limited to Freund's (complete and incomplete), mineral gels such asaluminum hydroxide, surface active substances such as lysolecithin,pluronic polyols, polyanions, peptides, oil emulsions, keyhold limpethemocyanins, dinitrophenol, and potentially useful human adjuvants suchas BCG (bacille Calmette-Guerin) and corynebacterium parvum.

For preparation of monoclonal antibodies, any technique which providesfor the production of antibody molecules by continuous cell lines inculture may be used. For example, the hybridoma technique originallydeveloped by Kohler and Milstein (1975, Nature 256, 495-497), as well asthe trioma technique, the human B-cell hybridoma technique (Kozbor etal., 1983, Immunology Today 4, 72), and the EBV-hybridoma technique toproduce human monoclonal antibodies (Cole et al., 1985, in MonoclonalAntibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).

Antibody fragments which contain the idiotype (binding domain) of themolecule can be generated by known techniques. For example, suchfragments include but are not limited to: the F(ab′)₂ fragment which canbe produced by pepsin digestion of the antibody molecule; the Fab′fragments which can be generated by reducing the disulfide bridges ofthe F(ab′)₂ fragment, and the Fab fragments which can be generated bytreating the antibody molecule with papain and a reducing agent.

In the production of antibodies, screening for the desired antibody canbe accomplished by techniques known in the art, e.g. ELISA(enzyme-linked immunosorbent assay).

6. Examples 6.1. Identification of Genes from cDNA Expression Libraries

A study was initiated to determine whether peptide recognition unitscould recognize functional domains that are the same as or similar totheir target functional domain but that are contained in proteins otherthan the protein containing their target functional domain. Such“functional” screens, using recognition units of relatively small size,were not previously known and were difficult to develop because of thelow degree of sequence homology among functional domain-containingproteins. Thus, for example, an oligonucleotide probe could not bedesigned with any degree of confidence based on the low degree ofhomology of primary sequences of SH3 domains.

Using SH3 domain-binding peptides from combinatorial peptide librariesas recognition units, we screened a series of mouse and human cDNAexpression libraries. We found that 69 of the 74 clones isolated fromthe libraries encoded at least one SH3 domain. These clones representmore than 18 different SH3 domain-containing proteins, of which morethan 10 have not been described previously.

The initial recognition unit chosen was a Src SH3 domain-binding peptide(termed pSrcCII) isolated from a phage-displayed random peptide library(Sparks et al., 1994, J. Biol. Chem. 269:23853-23856). pSrcCII was(biotin-SGSGGILAPPVPPRNTR-NH₂) (SEQ ID NO:1). pSrcCII was synthesized bystandard FMOC chemistry, purified by HPLC, and its structure wasconfirmed by mass spectrometry and amino acid analysis. To formmultivalent complexes, 50 pmol biotinylated pSrcCII peptide wasincubated with 2 μg streptavidin-alkaline phosphatase (SA-AP) (for abiotin:biotin-binding site ratio of 1:1). Excess biotin-binding siteswere blocked by addition of 500 pmol biotin. Alternatively, 31.2 μl of 1mg/ml SA-AP could have been incubated with 15 μl of 0.1 mM biotinylatedpeptide for 30 min at 4° C. Ten μl of 0.1 mM biotin would then be added,and the solution incubated for an additional 15 min.

A λEXlox mouse 16 day embryo cDNA expression library was obtained fromNovagen (Madison, Wis.). The cDNA library was screened according topublished protocols (Young and Davis, 1983, Proc. Natl. Acad. Sci. USA80:1194-1198). The library was plated at an initial density of 30,000plaques/100 mm petri plate as follows. A library aliquot was diluted1:1000 in SM (100 mM NaCl, 8 mM MgSO₄, 50 mM Tris HCl pH 7.5, 0.01%gelatin). Three μl of diluted phage were added to 1.5 ml each of SM, 10mM CaCl₂/MgCl₂, and an overnight culture of BL21(DE3) pLysE E. colicells. BL21 overnight cultures were grown in 2xYT medium (1.6% tryptone,1% yeast extract, and 0.5% NaCl) supplemented with 10 mM MgSO₄, 0.2%maltose, and 25 μg/ml chloramphenicol. This mixture was incubated 20 minat 37° C., after which 300 μl were plated on each of 14 2xYT agar platesin 3 ml 0.8% 2xYT top agarose containing 25 μg/ml chloramphenicol.Plaques were allowed to form for 6 hours at 37° C., after whichisopropyl-β-D-thiogalactopyranoside (IPTG)-soaked filters were applied.After an additional eight hours incubation at 37° C., the filters weremarked, removed from the plates, and washed three times with phosphatebuffered saline (PBS; 137 mM NaCl, 2.7 mM KCl, 4.3 mM Na₂HPO₄, 1.4 mMKH₂PO₄), 0.1% Triton X-100. The filters were blocked for 1 hour in PBS,2% bovine serum albumin (blocking solution) and subsequently incubatedovernight at 4° C. with fresh blocking solution plusstreptavidin-alkaline phosphatase (SA-AP) complexed peptide.Approximately 1 μg SA-AP complexed with peptide in 1 ml blockingsolution was used for each filter. The filters were then subjected tofour 15 minute washes with PBS, 0.1% Triton X-100. Bound SA-AP-peptidecomplexes were detected by incubation with 44 ml nitroblue tetrazoliumchloride (NBT, 75 mg/ml in 70% dimethylformamide) and 33 ml of5-bromo-4-chloro-3-indoyl-phosphate-p-toluidine salt (BCIP 50 mg/ml indimethylformamide) in 10 ml of alkaline phosphatase buffer (0.1 MTris-HCl, pH 9.4, 0.1 M NaCl, 50 mM MgCl₂); the signals were robust,often evident within a few minutes. Positive plaques were cored with aPasteur pipet and placed in 1 ml SM with a drop of chloroform. Lambdaphage particles are structurally resistant to chloroform, which servesas a bacteriocidal agent. These cores were allowed to diffuse intosolution for at least 1 hr before subsequent platings. Phage from coreswere plated in 100 μl each of SM, 10 mM CaCl₂/MgCl₂, and an overnightculture of BL21 (DE3) pLySE cells. Phage were plated with the intentionof reducing the number of plaque forming units (pfu)/plate by roughly afactor of 10 with each screen (i.e., 3×10⁴ in the primary screen, 3×10³in the secondary, and so on). This was accomplished by diluting cores1:1000 and plating 1-10 μl/plate. Four screens were generally requiredto obtain isolated plaques.

Plasmids were rescued from the λEXlox phage by cre-mediated excision inBM25.8 E. coli cells. For each clone, 5 μl of a 1:100 dilution of phagewere added to a solution containing 100 μl SM and 100 μl of an overnightculture of BM25.8 cells (grown in 2xYT media supplemented with 10 mMMgSO₄, 0.2% maltose, 34 μg/ml chloramphenicol, and 50 μg/ml kanamycin).After 30 minutes at 37° C., 100 μl of this solution were spread on an LBamp agarose plate and incubated overnight at 37° C. A single colony fromeach plate was used to inoculate 3 ml of 2xYT/amp and incubatedovernight. Plasmid DNA was purified from the overnight culture usingPromega Wizard Miniprep DNA purification kits (Promega, Madison, Wis.),extracted with an equal volume of phenol/chloroform followed bychloroform alone, and ethanol precipitated. This plasmid DNA was used totransform chemical-competent DH5α cells. Three colonies from eachtransformation were used to inoculate 3 ml cultures; DNA was purified asdescribed above. Approximately, 1/20 of each individually purified DNAsample from transformed cells was digested with EcoR1 and HindIII andexamined by electrophoresis on a 1% agarose gel to determine insert sizeand DNA quality. One DNA prep for each clone was either sequencedmanually using the dideoxy method or by an automated technique that usesfluorescent dideoxynucleotide terminators. The T7-gene 10 primer locatedapproximately 40 bp upstream of the EcoR1 restriction site was usedconveniently in both cases.

Approximately 100 of 1×10⁶ plaques in the primary screen of the λEXlox16 day mouse embryo cDNA expression library exhibited significantpSrcCII-binding activity. FIG. 5 is representative of filters fromprimary and tertiary screens. Of the eighteen positive clones that wereisolated and sequenced, all were found to encode proteins with SH3domains, although several clones appeared to be siblings or to originatefrom the same mRNA. Thus, the pSrcCII screen resulted in theidentification of cDNAs encoding nine distinct SH3 domain-containingproteins (see FIG. 9). The sequences of these proteins were compared tothe sequences in GenBank with the computer program BLAST. Three of theseproteins corresponded to entries in GenBank. SH3P1 appears to be themurine homologue of p53 bp2, a p53-binding protein, p53 bp2 (Iwabuchi etal., 1994, Proc. Natl. Acad. Sci. USA 91:6098-6102); SH3P6 resembleshuman MLN50, a gene amplified in some breast carcinomas (Tomasetto etal., 1995, Genomics 28:367-376); and SH3P5 is Cortactin, a proteinimplicated in cytoskeletal organization (Wu and Parsons, 1993, J. CellBiol. 120:1417-1426). Six of the clones did not match entries inGenBank, indicating that the present invention can be used to identifynovel SH3 domain-containing proteins. Of these novel proteins, SH3P2contains three ankyrin repeats and a proline-rich region flanking itsSH3 domain; SH3P7 and SH3P9 contain sequences related to regions in theproteins drebrin (Ishikawa et al., 1994, J. Biol. Chem. 269:29928-29933)and amphiphysin (David et al., 1994, FEBS Lett. 351:73-79),respectively. Finally, the novel proteins SH3P4 and SH3P8, although notsimilar to any known proteins, are highly related (89% amino acidsimilarity) to one another.

The present invention can be used as part of an iterative process inwhich a recognition unit is used to identify proteins containingfunctional domains which are, in turn, used to derive additionalrecognition units for subsequent screens. For example, to define thebinding specificity of these newly cloned SH3 domains, they can beoverexpressed as glutathione S-transferase (GST)-fusion proteins inbacteria, which, in turn, can be used to screen a random peptide libraryin order to obtain recognition units which, in turn, can be used toscreen cDNA libraries in order to obtain still more novel proteinscontaining SH3 domains.

The recognition unit binding preferences of two of the SH3 domainsisolated in the pSrcCII screen described above (p53 bp2 and Cortactin)have been described (Sparks et al., 1996, Proc. Natl. Acad. Sci. USA93:1540-1544. Each of these SH3 domains recognizes recognition unitmotifs related to, yet distinct from, the pSrcCII sequence. We used asynthetic peptide (pCort) containing the Cortactin SH3 recognition unitmotif to screen the mouse embryo cDNA expression library. pCort was(biotin-SGSGSRLTPQSKPPLPPKPSWVSR-NH₂) (SEQ ID NO:2). pCort was preparedand complexed with SA-AP as above for pSrcCII. Screening of the mouseembryo library with pCort was done as above for pSrcCII.

Twenty six clones, of varying signal strength, were isolated andtwenty-one were found to encode SH3 domain containing proteins. ThepCort screen yielded genes corresponding to nine distinct SH3domain-containing proteins (see FIG. 9), four of which corresponded toentries in GenBank. SH3P5 and SH3P6 are Cortactin and MLN50, discussedabove; SH3P10 matched SPY75/HS1, a protein involved in IgE signaling(Fukamachi et al., 1994, J. Immunol. 152:642-652); and SH3P11 is Crk, anSH2 domain and SH3 domain-containing adaptor molecule (Knudsen et al.,1994, J. Biol. Chem. 269:32781-32787). The five novel transcripts encodeSH3P7, SH3P8, and SH3P9, discussed above; SH3P13, an additional memberof the SH3P4/SH3P8 family; and SH3P12, a protein with three SH3 domainsand a region sharing significant sequence similarity with the peptidehormone sorbin (Vagen-Descroiz M. et al., 1991, Eur. J. Biochem.201:53-50).

Interestingly, the output from the pCort screen only partiallyoverlapped with that of the pSrcCII screen: four of the nineSH3-containing proteins isolated with pCort were not identified withpSrcCII. In addition, SH3P9, the protein identified most frequently(50%) in the pSrcCII screen was isolated at a much lower frequency (7%)with the pCort probe. Thus, different recognition units can be used toidentify distinct sets of SH3 domains.

In addition to possessing at least one SH3 domain, a prominentcharacteristic of the proteins identified in the pSrcCII and pCortscreens is the position of the SH3 domain within the proteins: twelve ofthirteen proteins possess SH3 domains near their C-termini. AlthoughpSrcCII binds well to the Src SH3 domain (FIG. 8), Src (whose SH3 domainoccurs near the N-terminus) was not identified in the pSrcCII screen. Wesuspect the bias was a consequence of the fact that the mouse embryocDNA library was constructed using oligo-dT-primed cDNA. Alternatively,it may be that the mRNA used to prepare the library contained verylittle, or no, Src transcripts.

A variant of the pSrcCII peptide (T12SRC.1) was used to probe a λgt22ahuman prostate cancer cell line cDNA library primed with oligo-dT and aλgt11 human bone marrow library primed with random and oligo-dT primers.T12SRC.1 was (biotin-GILAPPVPPRNTR-NH₂) (SEQ ID NO:3). T12SRC.1 was usedin the initial screens together with the peptide T12SRC.4. T12SRC.4 was(biotin-VLKRPLPIPPVTR-NH₂) (SEQ ID NO:4). The λgt22a human prostatecancer cell line cDNA library was made from the LNCaP prostate cancercell line by using standard methods, i.e., the Superscript Lambda systemfor cDNA synthesis and cloning (Bethesda Research Laboratories,Gaithersburg, Md.). The λgt11 human bone marrow cDNA expression librarywas obtained from Clonetch (Palo Alto, Calif.). The human libraries werescreened and positive clones isolated as described above for the mouse16 day embryo cDNA library, except that cDNA inserts of the λgt11 andλgt22a phage were amplified by PCR rather than being rescued bycre-mediated excision. Of the 1.2×10⁷ λcDNA clones screened from theselibraries, 30 exhibited detectable pSrcCII-binding activity. Analysis ofthe positive clones revealed that they each encoded at least one SH3domain, and that they originated from a total of six differenttranscripts (FIG. 9). Three of these encode proteins possessingnon-C-terminal SH3 domains, indicating that the present invention can beused to identify active domains regardless of their position within aprotein. Of the six proteins identified, only three matched GenBankentries. SH3P15 and SH3P16 are Fyn (Kawakami et al., 1988, Proc. Natl.Acad. Sci. USA 85:3870-3874 and Lyn (Yamanashi et al., 1987, Mol. Cell.Biol. 7:237-243), respectively, two Src-family members possessing SH3domains with ligand preferences similar to that of the Src SH3 domain(Rickles, 1994, EMBO J. 13:5598-5604); and SH3P14 appears to be thehuman homologue of murine H74, a protein of unknown function. The threeremaining proteins did not match entries in GenBank and include thehuman homolog of SH3P9, described above, and SH3 P17 and SH3P18,fragments of two related (85% amino acid similarity) adaptor-likeproteins comprised of at least four and three SH3 domains, respectively.

Examination of the primary sequences of the SH3 domains identified inthis work reveals several interesting features (see FIG. 10). Positionsimportant for ligand binding by the Src SH3 domain (Feng et al., 1994,Science 266:1241-1247; Lescure et al., 1992, J. Mol. Biol. 228:38.7-94)and essential for SH3 function in Grb2/Sem5 are conserved (Clark et al.,1992, Nature 356:340-344). In addition, the two gaps in the sequencealignment shown in FIG. 10 correspond to regions of length variationobserved among previously characterized SH3 domains. Surprisingly, theSH3 domains identified in this work are not significantly more similarto one another than they are to other known SH3 domains, with theexception of the mouse and human forms of SH3P9 and SH3P14 which are100% and 83% identical, respectively. This result indicates that SH3domains can vary widely in primary structure and still bind proline-richpeptide recognition units selectively.

6.1.1. Nucleotide and Corresponding Amino Acid Sequences of GenesIdentified from cDNA Expression Libraries

The nucleotide sequences of SH3P1, SH3P2, SH3P3, SH3P4, SH3P5, SH3P6,SH3P7, SH3P8, SH3P9, SH3P10, SH3P11, SH3 P12, SH3P13, and SH3 P14, themouse genes identified by screening the 16 day mouse embryo cDNAexpression library with the peptides pSrcII and pCort, are shown inFIGS. 18, 20, 22, 24, 26, 28, 30, 32, 34, 38, 40, 42A and B, 44, and 46Aand B, respectively. The corresponding amino acid sequences of the mousegenes SH3P1, SH3P2, SH3P3, SH3P4, SH3P5, SH3P6, SH3P7, SH3P8, SH3P9,SH3P10, SH3P11, SH3P12, SH3P13, and SH3 P14 are shown in FIGS. 19, 21,23, 25, 27, 29, 31, 33, 35, 39, 41, 43, 45, and 47, respectively.

The nucleotide sequences of SH3P9, SH3P14, SH3P17, and SH3P18, humangenes identified by screening the human bone marrow and human prostatecancer cDNA expression libraries with the peptide T12SRC.1, are shown inFIGS. 36, 48, 50, and 52, respectively. The corresponding amino acidsequences of the human genes SH3P9, SH3P14, SH3 P17, and SH3P18 areshown in FIGS. 37, 49, 51, and 53, respectively.

Two genes, SH3P9 and SH3P14, were isolated from both mouse and humanlibraries.

The sequences of SH3P15 and SH3P16 are not shown. SH3P15 is Lyn and SH3P16 is Fyn.

FIG. 54 shows the nucleotide sequence of clone 55, a novel human geneidentified and isolated from a human bone marrow cDNA library (describedin Section 6.1) using as recognition units a mixture of T12SRC.4 andpCort (described in Section 6.1) and the methods described in Section6.1.

FIG. 55 shows the amino acid sequence of clone 55.

FIG. 56 shows the nucleotide sequence of clone 56, a novel human geneidentified and isolated from a human bone marrow cDNA library (describedin Section 6.1) using as recognition units a mixture of T12SRC.4 andpCort (described in Section 6.1) and the methods described in Section6.1.

FIG. 57 shows the amino acid sequence of clone 56.

FIG. 58A shows the nucleotide sequence from position 1-1720 and FIG. 58Bshows the nucleotide sequence from position 1720-2873 of clone 65, anovel human gene identified and isolated from a human bone marrow cDNAlibrary (described in Section 6.1) using as recognition units a mixtureof P53BP2.Con and Nck1.Con3 and the methods described in Section 6.1.P53BP2.Con and Nck1.Con3 are peptides, the amino acid sequences of whichare biotin-SFAAPARPPVPPRKSRPGG-NH₂ (SEQ ID NO:201) andbiotin-SFSFPPLPPAPGG-NH₂ (SEQ ID NO:202), respectively. The sequences ofP53BP2.Con and Nck1.Con3 are consensus sequences of recognition unitsthat bind to the SH3 domains of p53 bp2 and Nck, respectively.

FIG. 59 shows the amino acid sequence of clone 65.

FIG. 60 shows the nucleotide sequence of clone 34, a novel human geneidentified and isolated from a human prostate cancer cDNA library(described in Section 6.1) using as recognition units a mixture ofT12SRC.1 and T12SRC.4 (described in Section 6.1) and the methodsdescribed in Section 6.1.

FIGS. 61A and 61B show the amino acid sequence of clone 34.

FIG. 62 shows the nucleotide sequence of clone 41, a novel human geneidentified and isolated from a human bone marrow cDNA library (describedin Section 6.1) using as recognition units a mixture of PXXP.NCK.S1/4and PXXP.ABL.G1/2M and the methods described in Section 6.1.PXXP.NCK.S1/4 and PXXP.ABL.G1/2M are peptides, the amino acid sequencesof which are biotin-SRSLSEVSPKPPIRSVSLSR-NH₂ (SEQ ID NO:222) andbiotin-SRPPRWSPPPVPLPTSLDSR-NH₂ (SEQ ID NO:223), respectively.PXXP.NCK.S1/4 and PXXP.ABL.G1/2M bind to the SH3 domains of Nck and Abl,respectively

FIGS. 63A and 63B show the amino acid sequence of clone 41.

FIG. 64 shows the nucleotide sequence of clone 53, a novel human geneidentified and isolated from a human prostate cancer cDNA library(described in Section 6.1) using as recognition units a mixture ofPXXP.NCK.S1/4 and PXXP.ABL.G1/2M and the methods described in Section6.1.

FIGS. 65A and 65B show the amino acid sequence of clone 53.

FIGS. 66A and 66B show the nucleotide and amino acid sequence of clone5, a novel human gene identified and isolated from a HELA cell cDNAlibrary using as recognition units a mixture of T12SRC.1 and T12SRC.4(described in Section 6.1) and the methods described in Section 6.1.

6.2. Use of Peptides Resembling SH3 Domain Binding Sequences asRecognition Units

We inspected a number of published amino acid sequences and identifiedproline-rich stretches of amino acids that resembled consensus SH3domain binding sequences. Peptides comprising these proline-richsequences were synthesized and tested by the methods of the presentinvention for their ability to specifically bind to the novel SH3domains described in Sections 6.1 and 6.1.1. Purified SH3domain-containing clones were spotted on a lawn of Y1090 host cells,grown for an appropriate amount of time, and plaque filter lifts werescreened with biotinylated peptides complexed with streptavidin-alkalinephosphatase as described in Section 6.1.

The results are shown in FIGS. 12 and 13. As can be seen, in many casesthe synthesized peptides were able to bind to the novel SH3 domains.This indicates that those synthesized peptides could have been used toidentify those novel SH3 domains from sources of polypeptides.

6.3. Valency of Peptide Recognition Units Affects Specificity ofRecognition Units

6.3.1 Preconjugation of Peptide Recognition Units withStreptavidin-Alkaline Phosphatase Increases Affinity of the RecognitionUnits for Targets

As a preliminary test of the effect of the valency of peptiderecognition units on the ability of those recognition units to be usedas probes to detect SH3 domains, biotinylated peptides that had beenpreviously shown to bind the SH3 domains of either Src or Abl weretested for their ability to bind their respective SH3 domain when eitherpreconjugated with streptavidin-alkaline phosphatase (SA-AP) or not sopreconjugated. GST-SrcSH3 and GST-AblSH3 fusion proteins (produced asdescribed in Sparks et al., 1994, J. Biol. Chem. 269:23853-23856) wereresolved by 10% SDS-PAGE and transferred to an Immobilon D nylonmembranes (Millipore, New Bedford, Mass.). The membranes were incubatedin blocking solution for 1 hr at 25° C. and then incubated overnight at4° C. with either biotinylated Src SH3 domain or biotinylated Abl SH3domain binding peptides in either multivalent (SA-AP) or monovalentformat. The filters were washed three times (15 min each wash) in PBS/Tand incubated with NBT and BCIP for color development. See Section 6.1for further details of the detection process.

The results are shown in FIG. 14. In panels A, the biotinylated peptideswere preconjugated with SA-AP and then allowed to bind to theimmobilized SH3 domains. Preconjugation was as described in Section 6.1.In panels B, the peptides were first allowed to bind to the immobilizedSH3 domains and then the bound peptides were detected by adding SA-AP.In both cases, color development was as in Section 6.1. The sequences ofthe peptides used were: Biotin-SGSGGILAPPVPPRNTR (SEQ ID NO:1) for theSrc specific peptide and Biotin-SGSGSRPPRWSPPPVPLPTSLDSR (SEQ ID NO:41)for the Abl specific peptide. The results shown in FIG. 14 demonstratethat preconjugation with SA-AP dramatically increases the strength ofthe signal detected.

6.3.2. Preconjugation of Peptide Recognition Units withStreptavidin-Alkaline Phosphatase Results in Recognition of a Variety ofSH3 Domains

Two μg of each of a panel of GST-SH3 domain fusion proteins weretransferred to Immobilon D nylon membranes (Millipore, New Bedford,Mass.) using a dot-blot apparatus. Biotinylated Src, Abl, or CortactinSH3 domain-binding peptides were preconjugated to SA-AP and incubatedwith the filter; an alkline-phophatase driven color reaction was used todetect peptide binding. The panel of immobilized proteins was alsoreacted with a polyclonal anti-GST antibody (Pharmacia, Piscataway,N.J.). Sequences of the Src, Abl, and Cortactin-binding peptides wereBiotin-SGSGVIKRPLPIPPVTR (SEQ ID NO:42), Biotin-SGSGSRPPRWSPPPVPLPTSLDSR(SEQ ID NO:41), and Biotin-SGSGSRLGEFSKPPIPQKPTWMSR (SEQ ID NO:43),respectively.

As can be seen from the results shown in FIG. 15, the preconjugatedbiotinylated peptides recognized not only their original target SH3domains, but related domains as well. The Src peptide recognized the SH3domains of Yes and Cortactin as well as the SH3 domain of Src; the Ablpeptide recognized the Cortactin SH3 domain as well as the Abl SH3domain; and the Cortactin peptide recognized Src, Yes, Abl, Crk, and theC terminal Grb2 SH3 domains as well as recognizing the Cortactin SH3domain.

The above experiment was performed utilizing SH3 domains that had beenimmobilized on nylon membranes. The following demonstrates thatpreconjugation with streptavidin also permits peptide recognition unitsto recognize a variety of SH3 domains when those domains are immobilizedin the wells of a microtiter plate.

Five different peptide recognition units (pAbl, pPLC, pCrk, pSrcCI,pSrcCII) were tested in either multivalent or monovalent format fortheir ability to bind to seven different SH3 domains (Src, Abl, PLCγ,Crk, Cortactin, Grb2N, Grb2C) in an ELISA. The sequences of thesepeptides were as follows: pAbl, SGSGSRPPRWSPPPVPLPTSLDSR (SEQ ID NO:41);pPLC, SGSGSMPPPVPPRPPGTLGG (SEQ ID NO:66); pCrk, SGSGNYVNALPPGPPLPAKNGG(SEQ ID NO:67); pSrcCI, SGSGVLKRPLPIPPVTR (SEQ ID NO:42); pSrcCII,SGSGGILAPPVPPRNTR (SEQ ID NO:1). These peptides were biotinylated as inSection 6.1.

The SH3 domains were produced as GST-SH3 fusion proteins as described inSparks et al., 1994, J. Biol. Chem. 269:23853-23856. Their purity andconcentration were confirmed by SDS-PAGE and Bradford protein assays,respectively. The GST-SH3 fusion proteins were immobilized in the wellsof microtiter plates as follows: Two micrograms of each GST-SH3 fusionprotein were incubated in wells of a flat bottom enzyme linkedimmunoabsorbent assay (ELISA) microtiter plate (Costar, Cambridge,Mass.) in 100 mM NaHCO₃ for 1 hr 25° C. One volume of SuperBlockblocking buffer (Pierce Chemical Co., Rockford, Ill.) was added to eachwell and incubated for an additional 30 min. Plates were washed threetimes with PBS/0.1% Tween-20/0.1% bovine serum albumin (BSA).Immobilized proteins were detected with SH3 domain-binding peptides inmultivalent or monovalent formats using streptavidin-horseradishperoxidase (SA-HRP; Sigma Chemical Co., St. Louis, Mo.). Forcomplexation of the biotinylated peptides and SA-HRP, peptide and SA-HRPconcentrations were as described for SA-AP complexation in Section 6.1,but all incubations and washes were in PBS/0.1% Tween-20/0.1% BSA.Plates were washed five times before calorimetric reaction and beforethe addition of SA-HRP (monovalent format). The amount of bound SA-HRPwas evaluated with the addition of 100 μl horseradish peroxidasesubstrate [2′,2′-Azino-Bis 3-Ethylbenzthiazoline-6-Sulfonic Acid (ABTS),0.05% hydrogen peroxide, 50 mM sodium citrate, pH 5.0]. After 5-30minutes of reaction time, the optical densities (OD) of the microtiterplate wells were measured with a microtiter plate scanner (MolecularDevices, Sunnyvale, Calif.) set for 405 nm wavelength. The results areshown in FIG. 8. From FIG. 8 it can be seen that the tetravalent(complexed) peptides display both increased affinity and broadenedspecificity toward SH3 targets. Binding of complexed peptides was,however, still restricted to SH3 domains; the complexes bind to neitherGST (FIG. 8) nor other unrelated proteins (data not shown). Thus,precomplexation with SA-AP decreases the specificity of the peptiderecognition units but does not make the peptides non-specific. Rather,the peptides, when precomplexed, recognize a variety of SH3 domains inaddition to their target domains.

6.3.3. Preconjugation of Peptide Recognition Units withStreptavidin-Alkaline Phosphatase Results Recognition of a Variety ofExpressed cDNA Clones

Lambda phage clones of genes containing a variety of SH3 domains wereisolated from screens of a 16 day mouse embryo cDNA expression library(Novagen, Madison, Wis.). For a description of the isolation of thesecDNA clones, see Section 6.1. Phage particles corresponding toindividual lambda phage cDNA recombinants were spotted onto 2xYT-1.5%agar petri plates onto which had been poured 3 ml of 2xYT-0.8% agarosewith 100 μl of a BL21(DE3) pLysE E. coli culture grown overnight. Aftera 6 hr incubation at 37° C., expression of the cDNA segments was inducedwith IPTG-soaked nitrocellulose filters. After overnight incubation, theexpressed proteins had been transferred to the filters and the filterswere then incubated with either biotinylated SH3-domain binding peptidespreconjugated to SA-AP or a monoclonal antibody recognizing the T7-Tagfusion peptide (αT7.10Mab; Novagen, Madison, Wis.). This antibody wasused as a positive control since it recognized an epitope expressed byall the clones (part of the ø10 leader sequence common to all λEXloxrecombinants). Sequences of pSrcI, pSrcII, Cortactin, and CaM(Calmodulin binding) peptides were Biotin-SGSGVLKRPLPIPPVTR (SEQ IDNO:42), Biotin-SGSGGILAPPVPPRNTR (SEQ ID NO:1),Biotin-SGSGSRLGEFSKPPIPQKPTWMSR (SEQ ID NO:43), andBiotin-STVPRWIEDSLRGGAARAQTRLASAK (SEQ ID NO:44), respectively.

The results are shown in FIG. 16. From FIG. 16 it can be seen thatprecomplexation with SA-AP decreases the specificity of the peptiderecognition units but does not make the peptides non-specific; none ofthe peptides react in a significant fashion with two negative controlsequences, α-actinin and calmodulin (CaM). Rather, the peptides, whenprecomplexed, recognize a variety of SH3 domain-containing cDNA clonesin addition to clones containing their target domains.

6.4. Characterization of cDNA Clone-Encoded Proteins

6.4.1. Production of cDNA Clone-Encoded Proteins

Purified DNA from all positive cDNA clones (ca. 18-20 positive clonesper recognition unit) was used to transform chemical-competent BL21cells (Hanahan et al., 1983, J. Mol. Biol. 166:557-580, the completedisclosure of which is incorporated by reference herein).

Colonies that appeared after growth overnight at 37° C. on 2xYT agarplates containing 100 μg/ml ampicillin were used to inoculate 4 mlcultures of 2xYT/amp. After 7 hours of incubation at 37° C. withshaking, IPTG was added to each culture to a final concentration of 100μM. After an additional 2 hours of incubation, 1 ml of each culture wascollected and centrifuged to pellet the cells. Cell pellets wereresuspended in 400 μl 1×SDS/DTT loading buffer and boiled at 100° C. for5 min. The resulting cell lysates were subjected to Sodium DodecylSulfate-Polyacrylamide Gel. Electrophoresis (SDS-PAGE) on an 8%acrylamide gel. Gels were either Coomassie stained or transferred toImmobilon D membrane (Millipore) and blotted (Towbin et al., 1979, Proc.Natl. Acad. Sci. 76:4350-4354).

6.5. Materials Used in Sections 6.1, 6.2, 6.3.1, 6.3.2, 6.3.3, and 6.4.1

Blocking Solution

Hepes (pH 8) 20 mM MgCl₂ 5 mM KCl 1 mM Dithiothreitol 5 mM Milk Powder5% w/v

2xYT Media (1L)

Bacto tryptone 16 g Yeast Extract 10 g NaCl  5 g

2xYT Agar Plates

2xYT+−15g agar/L

2xYT Top Agarose (8%)

2xYT+8g agarose/L

SDS/DTT Loading Buffer

(10 mL of 5x solution)

.5 M Tris base 0.61 g 8.5% SDS 0.85 g 27.5% sucrose 2.75 g 100 mM DTT0.154 g .03% Bromophenol Blue 3.0 mg

Overnight Cell Cultures:

Inoculate media with one isolated colony of appropriate cell type andincubate 37° C. O/N with shakingBL21 (DE3) pLysE2xYT media

maltose 0.2% MgSO₄ 10 mM Chloramphenicol 25 μg/mL

BM25.8

2xYT media

maltose 0.2% MgSO₄ 10 mM Chloramphenicol 34 μg/ml Kanamycin 50 μg/ml

6.6. Other Functional Domains and Recognition Units

In a manner similar to that described above for SH3 domains, recognitionunits directed to other functional domains of interest can be chosen foruse in the present method. For example, as recognition units for a studyof GST functional domains, the following GST-binding peptides can beused to screen a plurality of polypeptides: Class I CWSEWDGNEC (SEQ IDNO:46), CGQWADDGYC (SEQ ID NO:47), CEOWDGYGAC (SEQ ID NO:48), CWPFWDGSTC(SEQ ID NO:49), CMIWPDGEEC (SEQ ID NO:50), CESOWDGYDC (SEQ ID NO:51),CQQWKEDGWC (SEQ ID NO:52), or CLYOWDGYEC (SEQ ID NO:53); ClassII-CMGDNLGDDC (SEQ ID NO:54), CMGDSLGOSC (SEQ ID NO:55), CMDDDLGKGC (SEQID NO:56), CMGENLGWSC (SEQ ID NO:57), or CLGESLGWMC (SEQ ID NO:58).

Moreover, the following SH2-binding peptides can be used according tothe methods of the present invention to identify SH2 domain-containingpolypeptides: GDGYEEISP (SEQ ID NO:59) (for Src family), GDGYDEPSP (SEQID NO:60) (for Nck), GDGYDHPSP (SEQ ID NO:61) (for Crk), GDGYVIPSP (SEQID NO:62) (PLCγN), GDGYQNYSP (SEQ ID NO:63) (for PLCγC), GDGYMAMSP (SEQID NO:64) (for p85PI3KN and p85PI3KC), or GDGQNYSP (SEQ ID NO:65) (forGrb2). See, Yang, Cell 72:767-778, the complete disclosure of which isincorporated by reference herein.

Further, polypeptides with a “PH” functional domain (analogous to theproteins Vav, Bcr, Msos, PLCδ, Atk, or Pleckstrin) can be identifiedusing PH-binding peptides, such as those described by Mayer et al., Cell73:629-630, the complete disclosure of which is incorporated byreference herein.

Other recognition units can be readily contemplated, including othersynthetic, semisynthetic, or naturally derived molecules.

The present invention is not to be limited in scope by the specificembodiments described herein. Indeed, various modifications of theinvention in addition to those described herein will become apparent tothose skilled in the art from the foregoing description and accompanyingfigures. Such modifications are intended to fall within the scope of theappended claims.

Various publications are cited herein, the disclosures of which areincorporated by reference in their entireties.

1. The method of identifying a polypeptide comprising a functional domain of interest comprising: (a) contacting a multivalent recognition unit complex with a plurality of polypeptides; and (b) identifying a polypeptide having a selective binding affinity for said recognition unit complex.
 2. The method of claim 1 in which said plurality of polypeptides is from a polypeptide expression library.
 3. The method of claim 1 in which said plurality of polypeptides is obtained from a virus.
 4. The method of claim 2 in which said expression library is a cDNA expression library.
 5. The method of claim 2 in which said expression library is a genomic DNA library.
 6. The method of claim 2 in which said expression library is a recombinant bacteriophage library.
 7. The method of claim 6 in which said recombinant bacteriophage library is a recombinant M13 library.
 8. The method of claim 2 in which said expression library is a recombinant plasmid or cosmid library.
 9. The method of claim 1 in which the recognition unit is a peptide.
 10. The method of claim 1 in which said recognition unit is a peptide having less than about 140 amino acid residues.
 11. The method of claim 1 in which said recognition unit is a peptide having less than about 100 amino acid residues.
 12. The method of claim 1 in which said recognition unit is a peptide having less than about 70 amino acid residues.
 13. The method of claim 1 in which said recognition unit is a peptide having about 6 to 60 amino acid residues.
 14. The method of claim 1 in which said recognition unit is a peptide having 20 to 50 amino acid residues.
 15. The method of claim 1 in which the valency of the recognition unit in the complex is at least two.
 16. The method of claim 9 in which the valency of the recognition unit in the complex is at least two.
 17. The method of claim 1 in which the valency of the recognition unit in the complex is at least four.
 18. The method of claim 9 in which the valency of the recognition unit in the complex is at least four.
 19. The method of claim 17 in which the recognition unit complex is a complex comprising (a) avidin or streptavidin, and (b) biotinylated recognition units.
 20. The method of claim 18 in which the recognition unit complex is a complex comprising (a) avidin or streptavidin, and (b) the biotinylated peptides. 21-102. (canceled) 